Regression Models: Logistic Regression


Definition

  • We have a mathematical function which gives a value between and , and to convert it to a value between (0,1), we need a Sigmoid function or a logistic function
  • We can visualize it as a boundary (the decision boundary) to separate 2 categories on a hyperplane, where each dimension is a variable (a certain type of information)
  • The algorithm used is also gradient descent

Common Questions

  1. What is a logistic function?
    Answer: .
  2. What is the range of values of a logistic function?
    Answer: The values of a logistic function will range from 0 to 1. The values of Z will vary from to .
  3. What are the cost functions of logistic function?
    Answer: The popular 2 are Cross-entropy or log loss. Note that MSE is not used as squaring sigmoid violates convexity (cause local extrema to appear).

Basic Implementation

1
2
3
4
5
6
7
8
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=2).fit(X, y)
clf.predict(X[:2, :])

clf.predict_proba(X[:2, :])
clf.score(X, y)

Notes

In fact, logistic regression is simple, but the key thing here is actually on the mathematics behind gradient descent and its multi-dimensional variations. I'll discuss about them in future posts.

Author

Zhenlin Wang

Posted on

2019-05-13

Updated on

2021-04-20

Licensed under