Rework of chapters 2 and 3

2025-08-19 20:03:51 +02:00
parent 1a3e33b5ed
commit 642559ee4d
3 changed files with 618 additions and 51 deletions
--- a/Chapters/3-SOLVING-PROBLEMS-BY-SEARCHING.md
+++ b/Chapters/3-SOLVING-PROBLEMS-BY-SEARCHING.md
@@ -63,18 +63,23 @@ A search problem is the union of the followings:
    `goal-states`
 - **Available Actions**
    All the `actions` available to the `agent`:
+
    ```python
        def get_actions(state: State) : set[Action]
    ```
+
 - **Transition Model**
    A `function` which returns the `next-state` after taking
    an `action` in the `current-state`:
+
    ```python
        def move_to_next_state(state: State, action: Action): State
    ```
+
 - **Action Cost Function**
    A `function` which denotes the cost of taking that
    `action` to reach a `new-state` from `current-state`:
+
    ```python
        def action_cost(
            current_state: State, 
@@ -130,7 +135,8 @@ For each `action` we generate a `node` and each `generated-node`, wheter

 We have 4 parameters:

- #### Completeness:
+- #### Completeness
+
    Is the `algorithm` guaranteed to find the `solution`, if any, and report
    for ***no solution***?

@@ -138,19 +144,22 @@ We have 4 parameters:
    algorithm for `infinite` ones, though it would be difficult reporting
    for ***no solution*** as it is impossible to explore the ***whole `space`***.

- #### Cost Optimality:
+- #### Cost Optimality
+
    Can it find the `optimal-solution`?

- #### Time Complexity:
+- #### Time Complexity
+
    `O(n) time` performance

- #### Space Complexity:
+- #### Space Complexity
+
    `O(n) space` performance, explicit one (if the `graph` is ***explicit***) or
    by mean of:

-    - `depth` of `actions` for an `optimal-solution`
-    - `max-number-of-actions` in **any** `path`
-    - `branching-factor` for a node
+  - `depth` of `actions` for an `optimal-solution`
+  - `max-number-of-actions` in **any** `path`
+  - `branching-factor` for a node

 ### Uninformed Algorithms

@@ -263,6 +272,7 @@ However the `space-complexity` and `time-complexity` are
 `max-branching-factor` and $d$ is the `search-depth`[^breadth-first-performance]

 This algorithm is:
+
 - `optimal`
 - `complete` (as long each `action` has the same `cost`)

@@ -285,6 +295,7 @@ $O(b^{1 + \frac{C^*}{\epsilon}})$ for bot `time` and `space-complexity`
 In the `worst-case` the `complexity` is $O(b^{d + 1})$ when all `actions` cost $\epsilon$

 This algorithm is:
+
 - `optimal`
 - `complete`

@@ -375,6 +386,7 @@ being the ***negative*** of `depth`. However we can use a `LIFO Queue`, instead
 `cost_function`, and delete the `reached_space` `dict`.

 This algorithm is:
+
 - `non-optimal` as it returns the ***first*** `solution`, not the ***best***
 - `incomplete` as it is `non-systematic`, but it is `complete` for `acyclic graphs`
    and `trees`
@@ -483,6 +495,7 @@ One addition of this algorithm is the parameter of `max_depth`. After we go afte
 `max_depth`, we don't expand anymore.

 This algorithm is:
+
 - `non-optimal` (see [Depth-First Search](#depth-first-search))
 - `incomplete` (see [Depth-First Search](#depth-first-search)) and it now depends
    on the `max_depth` as well
@@ -531,6 +544,7 @@ This is a ***flavour*** of `depth-limited search`. whenever it reaches the `max_
 the `search` is ***restarted*** until one is found.

 This algorithm is:
+
 - `non-optimal` (see [Depth-First Search](#depth-first-search))
 - `incomplete` (see [Depth-First Search](#depth-first-search)) and it now depends
    on the `max_depth` as well
@@ -663,6 +677,7 @@ a [Best-First Search](#best-first-search), making us save up on ***memory*** and
 On the other hand, the implementation algorithm is ***harder*** to implement

 This algorithm is:
+
 - `optimal` (see [Best-First Search](#best-first-search))
 - `complete` (see [Best-First Search](#best-first-search))
 - $O(b^{1 + \frac{C^*}{2\epsilon}})$ `time-complexity`
--- a/docs/2-INTELLIGENT-AGENTS.md
+++ b/docs/2-INTELLIGENT-AGENTS.md
@@ -0,0 +1,249 @@
+# Intelligent Agents
+
+## Task Environment
+
+### Performance
+
+How well the job is done
+
+### Environment
+
+Something we must accept as it is
+
+### Actuators
+
+Something we can use to change our Environment
+
+### Sensors
+
+Something we can use to perceive our Environment
+
+## Properties of a Task Environment
+
+- Observability
+  - Fully Observable: Sensors gather all info
+  - Partially Observable: Sensors gather info, but some is unavailable
+  - Unobservable: No Sensors at all
+- Number of Agents:
+  - Multi-Agent: Other Agents (even people may be considered agents) live in the environment
+    > [!NOTE]
+    > In order to be considered an agent, the other entity must maximize an objective
+    > that depends on our agent behaviour
+    - Cooperative: All agents try to maximize the same objective
+    - Competitive: Agent objective can be maximized penalizing other agents objective
+  - Single-Agent: Only one agent exists
+- Predictability
+  - Deterministic: We can predict everything
+  - Stochastic: We can predict outcomes according to some probabilities
+  - Nondeterministic: We cannot predict everything nor the probability
+- Memory-Dependent
+  - Episodic: Each stimulus-action is independent from previous actions
+  - Sequential: Current actions may influence ones in the future, so we need to keep memory
+- Staticity
+  - Static: Environment does not change **while our agent is deciding**
+  - Dynamic: Environment changes **while our agent is deliberating**
+  - SemiDynamic: Environment is Static, but agent's performance changes with time
+- Continuousity **(Applies to States)**
+  - Continuous: State has continuous elements
+  - Discrete: State has no continuous elements
+- Knowlegde
+    > [!CAUTION]
+    > It is not influenced by Observability, as this refers to the **outcomes
+    > of actions** and **not over the state of the agent**
+  - Known: Each rule is known a priori (known outcomes)
+  - Unknown: The agent must discover environment rules (unknown outcomes)
+
+### Environment classes
+
+According to properties, we can define a class of Environments on where we can
+test our agents
+
+### Architecture
+
+The actual *hardware* (actuators) where our program will run, like a robot, or
+a pc.
+
+## Agent Programs
+
+> [!NOTE]
+> All programs have the same pseudo code:
+>
+> ```python
+>  def agent(percept) -> Action:  
+> ```
+
+### Table Driven Agent
+
+It basically has all possible reactions to stimuli at time $\mathcal{T}_i$, thus
+a space of $\sum_{t=1}^{T}|\mathcal{S}|^{t}$, which quicly becomes enormous
+
+> [!TIP]
+> It is actually complete and makes us react at best
+
+> [!CAUTION]
+> It is very memory consuming, so it is suitable for small problems
+
+```python
+class TableDrivenAgent(Agent):
+
+    def __init__(
+        self,
+        action_table: dict[list[Percept], Action]
+    ):
+        self.__action_table = action_table
+        self.__percept_sequence : list[Percept]= []
+
+    
+    def agent(self, percept: Percept) -> Action:
+        self.__percept_sequence.push(
+            percept
+        )
+
+        return self.__action_table.get(
+            self.__percept_sequence
+        )
+
+```
+
+### Reflex Agent
+
+Acts only based on stimuli at that very time $t$
+
+> [!TIP]
+> It only reacts to stimuli based on the current state, so it's smaller and very fast
+
+> [!CAUTION]
+> It is very limited in its capabilities and **requires a fully observable environment**
+
+```python
+class ReflexAgent(Agent):
+
+    def __init__(
+        self,
+        rules: list[Rule], # It depends on how the actual function is implemented
+    ):
+        self.__rules = rules
+    
+
+    def agent(self, percept: Percept) -> Action:
+        status = self.__get_status(percept)
+        rule = self.__get_rule(status)
+        return rule.action
+
+
+    # MUST BE IMPLEMENTED
+    def __get_status(self, percept: Percept) -> Status:
+        pass
+
+
+    # MUST BE IMPLENTED (it uses our rules)
+    def __get_rule(self, state: State) -> Rule:
+        pass
+
+```
+
+### Model-Based Reflex Agent
+
+This agent has 2 models that helps it keeping an internal representation of the world:
+
+- **Transition Model**:\
+    Knowledge of *how the world works* (What my actions do)
+- **Sensor Model**:\
+    Knowledge of *how the world is reflected in agent's percept*
+    (How the world evolves without me)
+
+> [!TIP]
+> It only reacts to stimuli based on the current state, but is also capable of predicting
+> next state, thus having some info on unobserved states
+
+> [!CAUTION]
+> It is more complicated to code and slightly worse in terms of raw speed. While it is
+> more flexible, it is still somewhat limited
+
+```python
+class ReflexAgent(Agent):
+
+    def __init__(
+        self,
+        initial_state: State,
+        transition_model: TransitionModel,  # Implementation dependent
+        sensor_model: SensorModel,          # Implementation dependent
+        rules: Rules                        # Implementation dependent
+    ):
+        self.__current_state = initial_state
+        self.__transition_model = transition_model
+        self.__sensor_model = sensor_model
+        self.__rules = rules
+        self.__last_action : Action = None
+    
+
+    def agent(self, percept: Percept) -> Action:
+        self.__update_state(percept)
+        rule = self.__get_rule()
+        return rule.action
+
+
+    # MUST BE IMPLEMENTED
+    def __update_state(self, percept: Percept) -> State:
+        """
+        Uses:
+            - percept
+            - self.__current_state,
+            - self.__last_action,
+            - self.__transiton_model,
+            - self.__sensor_model
+        """
+        # Do something
+        self.__current_state = state
+
+
+    # MUST BE IMPLENTED (it uses our rules)
+    def __get_rule(self) -> Rule:
+        """
+        Uses:
+            self.__current_state,
+            self.__rules
+        """
+        # Do something
+        return rule
+
+```
+
+### Goal-Based Agent
+
+It's an agent that **has an internal state representation**, **can predict the next state** and **chooses an action that satisfies its goals**.
+
+> [!TIP]
+> It is very flexible and needs less hardcoded info, compared to a reflex-based approach, allowing a rapid
+> change of goals without reprogramming the agent
+
+> [!CAUTION]
+> It is more computationally expensive to implement, moreover it requires a strategy to choose the best action
+
+### Utility-Based Agent
+
+It's an agent that **performs to maximize its expected utility**, useful when **goals are incompatible** and there's **uncertainty
+aboout how to achieve a goal**
+
+> [!TIP]
+> Useful when we have many goals with different importance and when we need to balance some incompatible ones
+
+> [!CAUTION]
+> It adds another layer of computation and since it chooses based on **estimantions**, it could be wrong
+
+> [!NOTE]
+> Not each goal-based agent has a model to guide it
+
+## Learning Agent
+
+Each agent may be a Learning Agent. and it is composed of:
+
+- **Learning Element**: Responsible for improvements
+- **Performance Element**: The entire agent, which takes actions
+- **Critic**: Gives Feedback to the Learning element on how to improve the agent
+- **Problem Genrator**: Suggests new actions to promote exploration of possibilites, avoiding to repeat the **best known action**
+
+## States Representation
+
+> [!NOTE]
+> Read pg 76 to 78 of Artificial Intelligence: A Modern Approach
--- a/docs/3-SOLVING-PROBLEMS-BY-SEARCHING.md
+++ b/docs/3-SOLVING-PROBLEMS-BY-SEARCHING.md
@@ -0,0 +1,303 @@
+# Solving Problems by Searching
+
+> [!WARNING]
+> In this chaptee we'll be talking about an environment with these properties:
+>
+> - episodic
+> - single agent
+> - fully observable
+> - deterministic
+> - static
+> - discrete
+> - known
+
+## Problem-solving agent
+
+It uses a **search method** to solve a problem. This agent is useful when it is not possible 
+to define an immediate action to perform, but rather it is needed to **plan ahead**
+
+However, this agent only uses **Atomic States**, making it faster, but limiting its
+power.
+
+## Algorithms
+
+They can be of 2 types:
+
+- **Informed**: The agent can estimate how far it is from the goal
+- **Uninformed**: The agent canìt estimate how far is from the goal
+
+And is usually composed of 4 phases:
+
+- **Goal Formulation**: Choose a goal
+- **Porblem Formulation**: Create a representation of the world
+- **Search**: Before taking any action, search a sequence that brings the agent to the goal
+- **Execution**: Execute the solution (the sequence of actions to the goal) in the real world
+
+> [!NOTE]
+> This is technically an **open loop** as the agent after the search phase **will ignore any other percept**
+
+### Problem
+
+This is what we are trying to solve with our agent, and it's a description of our environment
+
+- Set of States - **State space**
+- Initial State
+- One or more Goal States
+- Available Actions - `def actions(state: State) -> list[Action]`
+- Transition Model - `def result(state: State, action: Action) -> State`
+- Action Cost Function - `def action_cost(current_state: State, action: Action, new_state: State) -> float`
+
+the sequence of actions that brings us to the goal is a solution, and if it has the lowest cost, then it is optimal.
+
+> [!TIP]
+> A State-Space can be represented as a graph
+
+### Search algorithms
+
+We usually use a tree to represent our state space:
+
+- Root node: Initial State
+- Child nodes: Reachable States
+- Frontier: Unexplored Nodes
+
+```python
+class Node:
+
+    def __init__(
+        self,
+        state: State,           # State represented
+        parent: Node | None,    # Node that generated this node
+        action: Action,         # Action that generated this node
+        path_cost: float        # Total cost to reach this node
+    ):
+        pass
+
+```
+
+```python
+
+class Frontier:
+"""
+Can Inherit from:
+    - heapq -> Best-first search
+    - queue -> Breadth-first search
+    - stack -> Depth-first search
+"""
+
+
+    def is_empty() -> bool:
+        """
+        True if no nodes in Frontier
+        """
+        pass
+
+
+    def pop() -> Node:
+        """
+        Returns the top node and removes it
+        """
+        pass
+
+
+    def top() -> Node:
+        """
+        Returns top node without removing it
+        """
+        pass
+
+
+    def add(node: Node):
+        """
+        Adds node to frontier
+        """
+        pass
+
+```
+
+#### Loop Detection
+
+There are 3 maun techniques to avoid loopy paths
+
+- Remember all previous states and keep only the best path
+  - Used in best-first search
+  - Fast computation
+  - Huge memroy requirement
+- Just forget about it (Tree-like search, because it does not check for redundant paths)
+  - Not every problem has repeated states, or little
+  - no memory overhead
+  - may potentially slow computation by a lot, or halt it completely
+- Check for repeated states along a chain
+  - Slows computation based on how many links are inspected
+  - Computation overhead
+  - No memory impact
+
+#### Performance metrics for search algorithms
+
+- Completeness:\
+    Can the algorithm **always** find a solution **if any** or report if **none** are found?
+- Cost optimality:\
+    Has the found solution the **lowest cost**?
+- Time complexity:\
+    O(n) notation for computation
+- Space complexity:\
+    O(n) notation for memory
+
+> [!NOTE]
+> We'll be using the following notation
+> 
+> - $b$: branching factor (max number of branches by one action)
+> - $d$: solution depth
+> - $m$: maximum tree depth
+> - $C$: cost
+> - $C^*$: Theoretical optimal cost
+> - $\epsilon$: Minimum cost
+
+#### Uninformed Strategies
+
+##### Best First Search
+
+- Take node with least cost and expand
+- Frontier must be a Priority Queue (or Heap Queue)
+- Equivalent to a Breadth-First approach when each node has the same cost
+- If $\epsilon = 0$ then the algorithm may stall, thus give every action a minial cost
+
+It is:
+
+- Complete
+- Optimal
+- $O(b^{1 + \frac{C^*}{\epsilon}})$ Time Complexity
+- $O(b^{1 + \frac{C^*}{\epsilon}})$ Space Complexity
+
+> [!NOTE]
+> If everything costs $\epsilon$ this is basically a breadth-first that costs 1 more on depth
+
+##### Breadth-First Search
+
+- Take nodes that were expanded the first
+- Frontier should be a Queue
+- Equivalent to a Best-First Search with an evaluation function that takes depth as the cost
+
+It is:
+
+- Complete
+- Optimal as **long cost is uniform across all States**
+- $O(b^d)$ Time Complexity
+- $O(b^d)$ Space Complexity
+
+##### Depth-First Search
+
+It is the best whenever we need to save up on memory, implemented as a tree like search, but has many drawbacks
+
+- Take nodes that were expansed the last
+- Frontier should be a Stack
+- Equivalent to a Best-First Search when the evaluation function has -depth as the cost
+
+It is: 
+
+- Complete as long as the space is finite
+- Non-Optimal
+- $O(b^m)$ Time Complexity
+- $O(bm)$ Space complexity
+
+> [!TIP]
+> It has 2 variants
+>
+> - Backtracking Search: Uses less memory going from $O(bm)$ to $O(m)$ Space complexity at the expense of making the algorithm more complex
+> - Depth-Limited: We limit depth at a certain limit and we don't expand once reached it. By expanding such limit iteratively, we have an **Iterative deepening search**
+
+##### Bidirectional Search
+
+Instead of searching only from the Initial State, we may search from the Goal State as well, potentially reducing our 
+complexity down to $O(b^{frac{d}{2}})$
+
+- We need two Frontiers
+- Two tables of reached states
+- Need to find a way to find a collision
+  
+It is:
+
+- Complete if both directions are breadth-first or uniform cost and space is finite
+- Optimal (as for breadth-first)
+- $O(b^{frac{d}{2}})$ Time Complexity
+- $O(b^{frac{d}{2}})$ Space Complexity
+
+#### Informed Strategies
+
+A strategy is informed if it uses info about the domain to make a decision.
+
+Here we introduce a new funciton called **heuristic function** $h(n)$ which is the estimated cost of
+the cheapest path from the state at node n to the goal state
+
+##### Greedy best-first search
+
+- It is a best-first search with $h(n)$ as the evaluation function
+- Never expands anything that is not on the solution path
+- May yield a worse outcome than other strategies
+- Can amortize Complexity to $O(bm)$ with good heuristics
+
+> [!CAUTION]
+> In other words, a greedy search takes only in account the heuristic distance from a state to the goal, not the
+> whole cost
+
+> [!NOTE]
+> There exists a version called **speedy search** that uses the estimated number of actions to reach the goal as its heuristic,
+> regardless of the actual cost
+
+It is:
+
+- Complete when space state is finite
+- Non-Optimal
+- $O(|V|)$ Time complexity
+- $O(|V|)$ Space complexity
+
+##### A\*
+
+- Best-First search with the evaluation valuation function equal to the path cost to node n plus the heuristic: $f(n) = g(n) + h(n)$
+- Optimality depends on the **admissibility** of its heuristic (a function that does not overestimate the cost to reach a goal)
+
+It is:
+
+- Complete
+- Optimal as long as the heuristic is admissable (demonstration on page 106 of book)
+- $O(b^d) Time complexity
+- $O(b^d) Space complexity
+
+> [!TIP]
+> While A\* is a great algorithm, we may sacrifice its optimality for a faster approach by introducing a weigth over the
+> heuristic function: $f(n) = g(n) + W \cdot h(n)$.
+>
+> The higher $W$ the faster and less accurate A\* will be
+
+> [!NOTE]
+> There exists a version called **Iterative-Deepening A\*** which cuts off at a certain value of $f(n)$ and if no goal is reached it
+> restarts by increasing the cut off using the smalles cost of the node that went over the previous $f(n)$, practically searching over a contour
+
+## Search Contours
+
+See pg 107 to 110 book
+
+## Memory bounded search
+
+See pg 110 to 115
+
+### Reference Count and Separation
+
+Discard nodes that have been visited by all of their neighbours
+
+### Beam Search
+
+Discard all Frontiers nodes that are not the K-strongest, or that are more than $\delta$ away from the strongest candidate
+
+### RBFS
+
+It's an algorithm similar to IDA\*, but faster. It tries to mimic a best-first search in linear space, but it regenerates a lot nodes already explored.
+At its core, it keeps the second best value in memory and each time expands, if it ahs not reached the goal, overwrites the cost of a node with it's cheapest child
+(in order to resume the exploration) and then goes to the second best it had in memory, and keeps the other result as the second best.
+
+### MA\* and SMA\*
+
+These are 2 algorithms that were born to make use of all the memory available to solve the "too little memory" sufferance of both IDA\* and RBFS
+
+## Heuristic Functions
+
+See pg 115 to 122