diff options
author | Leonard Kugis <leonard@kug.is> | 2022-08-07 03:35:26 +0200 |
---|---|---|
committer | Leonard Kugis <leonard@kug.is> | 2022-08-07 03:35:26 +0200 |
commit | b2c11d7e3bfb90c7b2a3c41886a3a1e754726150 (patch) | |
tree | 2fd85927c0e63486c112fd4d441a7d1df250db90 /en_GB/Introduction to Machine Learning/introduction_to_machine_learning.md | |
parent | f65ec3a75070feb4f19ed64b1afade68f50e6456 (diff) |
GML: Added formulas for lectures 1-3
Diffstat (limited to 'en_GB/Introduction to Machine Learning/introduction_to_machine_learning.md')
-rw-r--r-- | en_GB/Introduction to Machine Learning/introduction_to_machine_learning.md | 75 |
1 files changed, 75 insertions, 0 deletions
diff --git a/en_GB/Introduction to Machine Learning/introduction_to_machine_learning.md b/en_GB/Introduction to Machine Learning/introduction_to_machine_learning.md new file mode 100644 index 0000000..0e17258 --- /dev/null +++ b/en_GB/Introduction to Machine Learning/introduction_to_machine_learning.md @@ -0,0 +1,75 @@ +# Introduction to Machine Learning + +## Collection of formulas + +### Quadratic error function + +$$ E(\textbf{w})=\frac{1}{2}\sum\limits_{n=1}^{N}(y(x_n, \textbf{w}) - t_n)^2 $$ + +### Quadratic error function with regularization + +$$ E(\textbf{w})=\frac{1}{2}\sum\limits_{n=1}^{N}(y(x_n, \textbf{w}) - t_n)^2 + \frac{\lambda}{2}\left\|\textbf{w}\right\|^2 $$ +$$ \lambda := \text{Penalty factor} $$ + +- "ridge regression" + +### Gaussian distribution in 1-D + +$$ \mathcal{N}(t|\mu,{\sigma}^{2}) = \frac{1}{\sqrt{2 \pi {\sigma}^{2}}} \text{exp}(-\frac{(t - \mu)^2}{2 {\sigma}^{2}}) $$ + +### Probabilistic modelling: likelihood in 1-D + +$$ p(t | x_0, \textbf{w}, \beta) = \mathcal{N}(t | y(\textbf{w}, x_0), {\sigma}^{2}) $$ +$$ \beta = \frac{1}{{\sigma}^{2}}\ \text{(\emph{precision})} $$ +$$ y(\textbf{w}, x_0) := \text{Output of the model at $x_0$ with parameters \textbf{w}} $$ + +### Probabilistic modelling: likelihood multidimensional + +$$ p(\textbf{t} | \textbf{x}_0, \textbf{w}, {\Sigma}^{-1}) = \mathcal{N}(\textbf{t} | y(\textbf{w}, \textbf{x}_0), {\Sigma}^{-1}) $$ +$$ \Sigma := \text{Covariance matrix} $$ +$$ y(\textbf{w}, \textbf{x}_0) := \text{Output of the model at $\textbf{x}_0$ with parameters \textbf{w}} $$ + +### Data-likelihood + +- Joint distribution over all data together +- Individual data points are assumed to be independent + +$$ L(\textbf{w}) = P(T | X, \textbf{w}, \beta) = \prod\limits_{n=1}^{N} \frac{1}{c} \text{exp}(-\frac{(t_n - y(x_n, \textbf{w}))^2}{2 {\sigma}^{2}}) $$ +$$ T := \text{Set of all target points (data)} $$ +$$ X := \text{Set of all inputs} $$ +$$ c := \text{Normalization constant} $$ +$$ N := \text{Number of all input data} $$ + +### Parameter optimization from data-likelihood + +$$ \text{maximize}\ L(\textbf{w}) \Leftrightarrow \text{minimize}\ -\text{log}L(\textbf{w}) $$ + +- Sum-of-squares-error is contained in $L(\textbf{w})$, rest are constants +- It is sufficient to minimize the sum-of-squares-error + +$$ \textbf{w}_{\text{ML}} = \text{argmax}_{\textbf{w}}(L(\textbf{w})) = \text{argmin}_{\textbf{w}}(\frac{1}{2} \sum\limits_{n=1}^{N} (y(x_n, \textbf{w}) - t_n)^2) $$ + +$$ \frac{1}{{\beta}_{\text{ML}}} = \frac{1}{N} \sum\limits_{n=1}^{N} (y(x_n, \textbf{w}_{\text{ML}}) - t_n)^2 $$ + +### Bayesian inference + +$$ P(\textbf{w} | D) = \frac{P(D | \textbf{w}) P(\textbf{w})}{P(D)} $$ +$$ P(\textbf{w} | D) := \text{Posterior} $$ +$$ P(D | \textbf{w}) := \text{Likelihood (model as before)} $$ +$$ P(\textbf{w}) := \text{A-priori probability for \textbf{w} (higher probability for smaller parameters)} $$ + +### Parameter optimization for bayesian approach + +$$ \text{maximize}\ P(\textbf{w} | D) \Leftrightarrow \text{minimize}\ -\text{log}P(\textbf{w} | D) $$ +$$ \textbf{w}_{\text{MAP}} = \text{argmax}_{\textbf{w}}(P(\textbf{w} | D)) = \text{argmin}_{\textbf{w}}(\frac{1}{2} \sum\limits_{n=1}^{N} (y(x_n, \textbf{w}) - t_n)^2 + \frac{\alpha}{2} \textbf{w}^{T} \textbf{w}) $$ +$$ \alpha := \text{Hyperparameter, denoting initial uncertainty} $$ + +## Definitions + +### Likelihood + +Function, describing the joint probability of the data $\textbf{x}$ as function of the parameters $\textbf{w}$ of the statistical model. + +### Bayesian-approach + +Probabilistic model for the parameters, not the actual data.
\ No newline at end of file |