Revised Optimization Notes
This commit is contained in:
@@ -33,7 +33,8 @@ $$
|
||||
|
||||
## Cross Entropy Loss derivation
|
||||
|
||||
A cross entropy is the measure of *"surprise"* we get from distribution $p$ knowing
|
||||
Cross entropy[^wiki-cross-entropy] is the measure of *"surprise"*
|
||||
we get from distribution $p$ knowing
|
||||
results from distribution $q$. It is defined as the entropy of $p$ plus the
|
||||
[Kullback-Leibler Divergence](#kullback-leibler-divergence) between $p$ and $q$
|
||||
|
||||
@@ -62,6 +63,23 @@ Usually $\hat{y}$ comes from using a
|
||||
[softmax](./../3-Activation-Functions/INDEX.md#softmax). Moreover, since it uses a
|
||||
logaritm and probability values are at most 1, the closer to 0, the higher the loss
|
||||
|
||||
## Computing PCA[^wiki-pca]
|
||||
|
||||
> [!CAUTION]
|
||||
> $X$ here is the matrix of dataset with **<ins>features over rows</ins>**
|
||||
|
||||
- $\Sigma = \frac{X \times X^T}{N} \coloneqq$ Correlation Matrix approximation
|
||||
- $\vec{\lambda} \coloneqq$ vector of eigenvalues of $\Sigma$
|
||||
- $\Lambda \coloneqq$ eigenvector columnar matrix sorted by eigenvalues
|
||||
- $\Lambda_{red} \coloneqq$ eigenvector matrix reduced to $k^{th}$
|
||||
highest eigenvalue
|
||||
- $Z = X \times\Lambda_{red}^T \coloneqq$ Compressed representation
|
||||
|
||||
> [!NOTE]
|
||||
> You may have studied PCA in terms of SVD, Singular Value Decomposition. The 2
|
||||
> are closely related and apply the same concept but applying different
|
||||
> mathematical formulas.
|
||||
|
||||
## Laplace Operator[^khan-1]
|
||||
|
||||
It is defined as $\nabla \cdot \nabla f \in \R$ and is equivalent to the
|
||||
@@ -80,8 +98,32 @@ It can also be used to compute the net flow of particles in that region of space
|
||||
> This is not a **discrete laplace operator**, which is instead a **matrix** here,
|
||||
> as there are many other formulations.
|
||||
|
||||
## [Hessian Matrix](https://en.wikipedia.org/wiki/Hessian_matrix)
|
||||
|
||||
A Hessian Matrix represents the 2nd derivative of a function, thus it gives
|
||||
us the curvature of a function.
|
||||
|
||||
It is also used to tell us whether the point is a local minimum (it is positive
|
||||
defined), local maximum (it is negative defined) or saddle (neither positive or
|
||||
negative defined).
|
||||
|
||||
It is computed by computing the partial derivatives of the gradient along
|
||||
all dimensions and then transpose it.
|
||||
|
||||
$$
|
||||
\nabla f = \begin{bmatrix}
|
||||
\frac{d \, f}{d\,x} & \frac{d \, f}{d\,y}
|
||||
\end{bmatrix} \\
|
||||
H(f) = \begin{bmatrix}
|
||||
\frac{d \, f}{d\,x^2} & \frac{d \, f}{d \, x\,d\,y} \\
|
||||
\frac{d \, f}{d\, y \, d\,x} & \frac{d \, f}{d\,y^2}
|
||||
\end{bmatrix}
|
||||
$$
|
||||
|
||||
[^khan-1]: [Khan Academy | Laplace Intuition | 9th November 2025](https://www.youtube.com/watch?v=EW08rD-GFh0)
|
||||
|
||||
[^wiki-cross-entropy]: [Wikipedia | Cross Entropy | 17th November 2025](https://en.wikipedia.org/wiki/Cross-entropy)
|
||||
|
||||
[^wiki-entropy]: [Wikipedia | Entropy | 17th November 2025](https://en.wikipedia.org/wiki/Entropy_(information_theory))
|
||||
|
||||
[^wiki-pca]: [Wikipedia | Principal Component Analysis | 18th November 2025](https://en.wikipedia.org/wiki/Principal_component_analysis#Computation_using_the_covariance_method)
|
||||
|
||||
199
Chapters/15-Appendix-A/python-experiments/pca.ipynb
Normal file
199
Chapters/15-Appendix-A/python-experiments/pca.ipynb
Normal file
@@ -0,0 +1,199 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8c14ea22",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Computing PCA\n",
|
||||
"\n",
|
||||
"Here I'll be taking data from [Geeks4Geeks](https://www.geeksforgeeks.org/machine-learning/mathematical-approach-to-pca/)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0b32eb5c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[1.8 1.87777778]\n",
|
||||
"[[ 0.7 0.52222222]\n",
|
||||
" [-1.3 -1.17777778]\n",
|
||||
" [ 0.4 1.02222222]\n",
|
||||
" [ 1.3 1.12222222]\n",
|
||||
" [ 0.5 0.82222222]\n",
|
||||
" [ 0.2 -0.27777778]\n",
|
||||
" [-0.8 -0.77777778]\n",
|
||||
" [-0.3 -0.27777778]\n",
|
||||
" [-0.7 -0.97777778]]\n",
|
||||
"[[0.6925 0.68875 ]\n",
|
||||
" [0.68875 0.79444444]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"X : np.ndarray = np.array([\n",
|
||||
" [2.5, 2.4],\n",
|
||||
" [0.5, 0.7],\n",
|
||||
" [2.2, 2.9],\n",
|
||||
" [3.1, 3.0],\n",
|
||||
" [2.3, 2.7],\n",
|
||||
" [2.0, 1.6],\n",
|
||||
" [1.0, 1.1],\n",
|
||||
" [1.5, 1.6],\n",
|
||||
" [1.1, 0.9]\n",
|
||||
"])\n",
|
||||
"\n",
|
||||
"# Compute mean values for features\n",
|
||||
"mu_X = np.mean(X, 0)\n",
|
||||
"\n",
|
||||
"print(mu_X)\n",
|
||||
"# \"Normalize\" Features\n",
|
||||
"X = X - mu_X\n",
|
||||
"print(X)\n",
|
||||
"\n",
|
||||
"# Compute covariance matrix applying\n",
|
||||
"# Bessel's correction (n-1) instead of n\n",
|
||||
"Cov = (X.T @ X) / (X.shape[0] - 1)\n",
|
||||
"\n",
|
||||
"print(Cov)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "78e9429f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As you can notice, we did $X^T \\times X$ instead of $X \\times X^T$. This is because our \n",
|
||||
"dataset had datapoints over rows instead of features."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 84,
|
||||
"id": "f93b7a92",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[0.05283865 1.43410579]\n",
|
||||
"[[-0.73273632 -0.68051267]\n",
|
||||
" [ 0.68051267 -0.73273632]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Computing eigenvalues\n",
|
||||
"eigen = np.linalg.eig(Cov)\n",
|
||||
"eigen_values = eigen.eigenvalues\n",
|
||||
"eigen_vectors = eigen.eigenvectors\n",
|
||||
"\n",
|
||||
"print(eigen_values)\n",
|
||||
"print(eigen_vectors)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bfbdd9c3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we'll generate the new X matrix by only using the first eigen vector"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 85,
|
||||
"id": "7ce6c540",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"(9, 1)\n",
|
||||
"Compressed\n",
|
||||
"[[-0.85901005]\n",
|
||||
" [ 1.74766702]\n",
|
||||
" [-1.02122441]\n",
|
||||
" [-1.70695945]\n",
|
||||
" [-0.94272842]\n",
|
||||
" [ 0.06743533]\n",
|
||||
" [ 1.11431616]\n",
|
||||
" [ 0.40769167]\n",
|
||||
" [ 1.19281215]]\n",
|
||||
"Reconstruction\n",
|
||||
"[[ 0.58456722 0.62942786]\n",
|
||||
" [-1.18930955 -1.28057909]\n",
|
||||
" [ 0.69495615 0.74828821]\n",
|
||||
" [ 1.16160753 1.25075117]\n",
|
||||
" [ 0.64153863 0.69077135]\n",
|
||||
" [-0.0458906 -0.04941232]\n",
|
||||
" [-0.75830626 -0.81649992]\n",
|
||||
" [-0.27743934 -0.29873049]\n",
|
||||
" [-0.81172378 -0.87401678]]\n",
|
||||
"Difference\n",
|
||||
"[[0.11543278 0.10720564]\n",
|
||||
" [0.11069045 0.10280131]\n",
|
||||
" [0.29495615 0.27393401]\n",
|
||||
" [0.13839247 0.12852895]\n",
|
||||
" [0.14153863 0.13145088]\n",
|
||||
" [0.2458906 0.22836546]\n",
|
||||
" [0.04169374 0.03872214]\n",
|
||||
" [0.02256066 0.02095271]\n",
|
||||
" [0.11172378 0.10376099]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Computing X coming from only 1st eigen vector\n",
|
||||
"Z_pca = X @ eigen_vectors[:,1]\n",
|
||||
"Z_pca = Z_pca.reshape([Z_pca.shape[0], 1])\n",
|
||||
"\n",
|
||||
"print(Z_pca.shape)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# X reconstructed\n",
|
||||
"eigen_v = (eigen_vectors[:, 1].reshape([eigen_vectors[:, 1].shape[0], 1]))\n",
|
||||
"X_rec = Z_pca @ eigen_v.T\n",
|
||||
"\n",
|
||||
"print(\"Compressed\")\n",
|
||||
"print(Z_pca)\n",
|
||||
"\n",
|
||||
"print(\"Reconstruction\")\n",
|
||||
"print(X_rec)\n",
|
||||
"\n",
|
||||
"print(\"Difference\")\n",
|
||||
"print(abs(X - X_rec))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "deep_learning",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.13.7"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
Reference in New Issue
Block a user