from sklearn.datasets import load_iris
= load_iris(as_frame=True)
iris iris.target_names
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
Classification is a supervised learning problem whose output are categorical, different from Linear regression of which are numerical.
The algorithms used to handle classification problems are divided into the following groups: Binary classification, multiclass classification, multilabel classification, multioutput classification, etc.
Pretty much same as method using in Linear regression, Logistic regression use sigmoid function to the same equation using in Linear regression to turn the output into probabilities (range (0,1)).
\[ \begin{gather} Linear Regression: y = θ^{T}X\\ Logistic Regression: p = sigmoid(θ^{T}X)\\ sigmoid(t) = \frac{1}{1-e^{-t}}\\ logit(p) = log\left(\frac{p}{1-p}\right)= t \end{gather} \]
Cost function for 1 instance \[\begin{equation} \begin{split} J(θ) & = -log(p)\quad \quad \quad if\;\;\; y=1\\ & = -log(1-p) \quad \; if\;\;\; y=0 \end{split} \end{equation} \]
Cost function penalizes the model when it estimates the loew probability for the real target class
There is no closed-form equation to compute θ. We will use gradient descent to find the best weights.
Cost function for whole training set (log loss): convex function \[J(θ) = \frac{−1}{m} \sum [y_ilog(p_i) + (1−y_i)log(1−p_i)]\]
Gradient \[∇ = \frac{1}{m}X^{T}[sigmoid(Xθ) - y]\]
assumption
: the instances follow a Gaussian distribution around the mean of their classassumption
: data is purely linearassumption
, the more biased
the modelDecision boudaries:
Regularization in Logistic Regression: l1, l2 using C parameter (inverse of alpha)
Implement Linear regression using sklearn: Logistic regression
The Logistic regression can be generalized to support multipleclass classification directly. It is called softmax regression.
The strategy when given an instance x is described like this:
Softmax score for class k \[s_k(x) = (θ^{(k)})^{T}X\]
Each class has own parameter vecto θ(k). Parameter matrix Θ contains all parameter vectors of all classes
Softmax function for class k: \[p_k = σ(s(x))_k = \frac{exp(s_k(x))}{\sum\limits exp(s_j(x))}\]
Choose the class with the highest probability \[y= argmax\; σ(s(x))_k= argmax\;s_k(x) = argmax\; (θ^{k})^{T}X\]
Just like Logistic regression, softmax regression has the cost function called Cross entropy
Cross entropy cost function
\[ J(Θ) = −\frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K}y_{k}^{(i)}log(p_{k}^{(i)}) \]
Cross entropy gradient vector for class k
\[ ∇_{θ}k = \frac{1}{m}\sum(p_{k}^{(i)} − y_{k}^{i})x^{(i)} \]
Implement Linear regression using sklearn
from sklearn.datasets import load_iris
= load_iris(as_frame=True)
iris iris.target_names
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
= iris.data[["petal length (cm)", "petal width (cm)"]].values
X = iris["target"]
y = train_test_split(X, y, random_state=42)
X_train, X_test, y_train, y_test
= LogisticRegression(max_iter=1000, C=30)
softmax
softmax.fit(X_train, y_train)print(softmax.predict([X_test[0]]))
print(softmax.predict_proba([X_test[0]]).round(4))
[1]
[[0. 0.9827 0.0173]]