# RL

Deep RL is a type of Machine Learning where an agent learns how to behave in an environment by performing actions and seeing the results.

# What is RL

The idea behind Reinforcement Learning is that an agent (an AI) will learn from the environment by interacting with it (through trial and error) and receiving rewards (negative or positive) as feedback for performing actions.

Learning from interactions with the environment comes from our natural experiences. Without any supervision, the agent gets better by learning from interaction.

Formally,
Reinforcement learning is a framework for solving control tasks (also called decision problems) by building agents that learn from the environment by interacting with it through trial and error and receiving rewards (positive or negative) as unique feedback.

# RL framework

# Example game

Our Agent receives state $S_0$ from the Environment — we receive the first frame of our game (Environment).
Based on that state S_0, the Agent takes action A_0 — our Agent will move to the right.
The environment goes to a new state $S_1$ — new frame.
The environment gives some reward R_1 to the Agent — we’re not dead (Positive Reward +1).

The agent’s goal is to maximize its cumulative reward, called the expected return.

# Reward hypothesis

RL is based on the reward hypothesis, which is that all goals can be described as the maximization of the expected return

# MDP

Markov Property implies that our agent needs only the current state to decide what action to take and not the history of all the states and actions they took before.

# Obs/State space

Observations/States are the information our agent gets from the environment. In the case of a video game, it can be a frame (a screenshot). In the case of the trading agent, it can be the value of a certain stock

State s: is a complete description of the state of the world (chess board)
Observation o: is a partial description of the state. In a partially observed environment. (platformer game frame)