Revised Optimization Notes

2025-11-20 18:47:36 +01:00
parent 2a96deaebf
commit 934c08d4c0
6 changed files with 1226 additions and 429 deletions
--- a/Chapters/15-Appendix-A/INDEX.md
+++ b/Chapters/15-Appendix-A/INDEX.md
@@ -33,7 +33,8 @@ $$

 ## Cross Entropy Loss derivation

-A cross entropy is the measure of *"surprise"* we get from distribution $p$ knowing
+Cross entropy[^wiki-cross-entropy] is the measure of *"surprise"*
+we get from distribution $p$ knowing
 results from distribution $q$. It is defined as the entropy of $p$ plus the
 [Kullback-Leibler Divergence](#kullback-leibler-divergence) between $p$ and $q$

@@ -62,6 +63,23 @@ Usually $\hat{y}$ comes from using a
 [softmax](./../3-Activation-Functions/INDEX.md#softmax). Moreover, since it uses a
 logaritm and probability values are at most 1, the closer to 0, the higher the loss

+## Computing PCA[^wiki-pca]
+
+> [!CAUTION]
+> $X$ here is the matrix of dataset with **<ins>features over rows</ins>**
+
+- $\Sigma = \frac{X \times X^T}{N} \coloneqq$ Correlation Matrix approximation
+- $\vec{\lambda} \coloneqq$ vector of eigenvalues of $\Sigma$
+- $\Lambda \coloneqq$ eigenvector columnar matrix sorted by eigenvalues
+- $\Lambda_{red} \coloneqq$ eigenvector matrix reduced to $k^{th}$
+    highest eigenvalue
+- $Z = X \times\Lambda_{red}^T \coloneqq$ Compressed representation
+
+> [!NOTE]
+> You may have studied PCA in terms of SVD, Singular Value Decomposition. The 2
+> are closely related and apply the same concept but applying different
+> mathematical formulas.
+
 ## Laplace Operator[^khan-1]

 It is defined as $\nabla \cdot \nabla f \in \R$ and is equivalent to the
@@ -80,8 +98,32 @@ It can also be used to compute the net flow of particles in that region of space
 > This is not a **discrete laplace operator**, which is instead a **matrix** here,
 > as there are many other formulations.

+## [Hessian Matrix](https://en.wikipedia.org/wiki/Hessian_matrix)
+
+A Hessian Matrix represents the 2nd derivative of a function, thus it gives
+us the curvature of a function.
+
+It is also used to tell us whether the point is a local minimum (it is positive
+defined), local maximum (it is negative defined) or saddle (neither positive or
+negative defined).
+
+It is computed by computing the partial derivatives of the gradient along
+all dimensions and then transpose it.
+
+$$
+\nabla f = \begin{bmatrix}
+    \frac{d \, f}{d\,x} & \frac{d \, f}{d\,y}
+\end{bmatrix} \\
+H(f) = \begin{bmatrix}
+     \frac{d \, f}{d\,x^2} & \frac{d \, f}{d \, x\,d\,y} \\
+      \frac{d \, f}{d\, y \, d\,x} & \frac{d \, f}{d\,y^2}
+\end{bmatrix}
+$$
+
 [^khan-1]: [Khan Academy | Laplace Intuition | 9th November 2025](https://www.youtube.com/watch?v=EW08rD-GFh0)

 [^wiki-cross-entropy]: [Wikipedia | Cross Entropy | 17th November 2025](https://en.wikipedia.org/wiki/Cross-entropy)

 [^wiki-entropy]: [Wikipedia | Entropy | 17th November 2025](https://en.wikipedia.org/wiki/Entropy_(information_theory))
+
+[^wiki-pca]: [Wikipedia | Principal Component Analysis | 18th November 2025](https://en.wikipedia.org/wiki/Principal_component_analysis#Computation_using_the_covariance_method)
--- a/Chapters/15-Appendix-A/python-experiments/pca.ipynb
+++ b/Chapters/15-Appendix-A/python-experiments/pca.ipynb
@@ -0,0 +1,199 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "8c14ea22",
+   "metadata": {},
+   "source": [
+    "# Computing PCA\n",
+    "\n",
+    "Here I'll be taking data from [Geeks4Geeks](https://www.geeksforgeeks.org/machine-learning/mathematical-approach-to-pca/)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0b32eb5c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[1.8        1.87777778]\n",
+      "[[ 0.7         0.52222222]\n",
+      " [-1.3        -1.17777778]\n",
+      " [ 0.4         1.02222222]\n",
+      " [ 1.3         1.12222222]\n",
+      " [ 0.5         0.82222222]\n",
+      " [ 0.2        -0.27777778]\n",
+      " [-0.8        -0.77777778]\n",
+      " [-0.3        -0.27777778]\n",
+      " [-0.7        -0.97777778]]\n",
+      "[[0.6925     0.68875   ]\n",
+      " [0.68875    0.79444444]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "X : np.ndarray = np.array([\n",
+    "    [2.5, 2.4],\n",
+    "    [0.5, 0.7],\n",
+    "    [2.2, 2.9],\n",
+    "    [3.1, 3.0],\n",
+    "    [2.3, 2.7],\n",
+    "    [2.0, 1.6],\n",
+    "    [1.0, 1.1],\n",
+    "    [1.5, 1.6],\n",
+    "    [1.1, 0.9]\n",
+    "])\n",
+    "\n",
+    "# Compute mean values for features\n",
+    "mu_X = np.mean(X, 0)\n",
+    "\n",
+    "print(mu_X)\n",
+    "# \"Normalize\" Features\n",
+    "X = X - mu_X\n",
+    "print(X)\n",
+    "\n",
+    "# Compute covariance matrix applying\n",
+    "#   Bessel's correction (n-1) instead of n\n",
+    "Cov = (X.T @ X) / (X.shape[0] - 1)\n",
+    "\n",
+    "print(Cov)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78e9429f",
+   "metadata": {},
+   "source": [
+    "As you can notice, we did $X^T \\times X$ instead of $X \\times X^T$. This is because our \n",
+    "dataset had datapoints over rows instead of features."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 84,
+   "id": "f93b7a92",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[0.05283865 1.43410579]\n",
+      "[[-0.73273632 -0.68051267]\n",
+      " [ 0.68051267 -0.73273632]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Computing eigenvalues\n",
+    "eigen = np.linalg.eig(Cov)\n",
+    "eigen_values = eigen.eigenvalues\n",
+    "eigen_vectors = eigen.eigenvectors\n",
+    "\n",
+    "print(eigen_values)\n",
+    "print(eigen_vectors)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfbdd9c3",
+   "metadata": {},
+   "source": [
+    "Now we'll generate the new X matrix by only using the first eigen vector"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 85,
+   "id": "7ce6c540",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(9, 1)\n",
+      "Compressed\n",
+      "[[-0.85901005]\n",
+      " [ 1.74766702]\n",
+      " [-1.02122441]\n",
+      " [-1.70695945]\n",
+      " [-0.94272842]\n",
+      " [ 0.06743533]\n",
+      " [ 1.11431616]\n",
+      " [ 0.40769167]\n",
+      " [ 1.19281215]]\n",
+      "Reconstruction\n",
+      "[[ 0.58456722  0.62942786]\n",
+      " [-1.18930955 -1.28057909]\n",
+      " [ 0.69495615  0.74828821]\n",
+      " [ 1.16160753  1.25075117]\n",
+      " [ 0.64153863  0.69077135]\n",
+      " [-0.0458906  -0.04941232]\n",
+      " [-0.75830626 -0.81649992]\n",
+      " [-0.27743934 -0.29873049]\n",
+      " [-0.81172378 -0.87401678]]\n",
+      "Difference\n",
+      "[[0.11543278 0.10720564]\n",
+      " [0.11069045 0.10280131]\n",
+      " [0.29495615 0.27393401]\n",
+      " [0.13839247 0.12852895]\n",
+      " [0.14153863 0.13145088]\n",
+      " [0.2458906  0.22836546]\n",
+      " [0.04169374 0.03872214]\n",
+      " [0.02256066 0.02095271]\n",
+      " [0.11172378 0.10376099]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Computing X coming from only 1st eigen vector\n",
+    "Z_pca = X @ eigen_vectors[:,1]\n",
+    "Z_pca = Z_pca.reshape([Z_pca.shape[0], 1])\n",
+    "\n",
+    "print(Z_pca.shape)\n",
+    "\n",
+    "\n",
+    "# X reconstructed\n",
+    "eigen_v = (eigen_vectors[:, 1].reshape([eigen_vectors[:, 1].shape[0], 1]))\n",
+    "X_rec = Z_pca @ eigen_v.T\n",
+    "\n",
+    "print(\"Compressed\")\n",
+    "print(Z_pca)\n",
+    "\n",
+    "print(\"Reconstruction\")\n",
+    "print(X_rec)\n",
+    "\n",
+    "print(\"Difference\")\n",
+    "print(abs(X - X_rec))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "deep_learning",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}