qmdp_agent
QMDP_Agent
Bases: PBVI_Agent
An agent that relies on Model-Based Reinforcement Learning. It is a simplified version of the PBVI_Agent. It runs the a Value Iteration solver, assuming full observability. The value function that comes out from this is therefore used to make choices.
As stated, during simulations, the agent will choose actions based on an argmax of what action has the highest matrix product of the value function with the belief vector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment |
Environment
|
The olfactory environment to train the agent with. |
required |
threshold |
float or list[float]
|
The olfactory threshold. If an odor cue above this threshold is detected, the agent detects it, else it does not. If a list of threshold is provided, he agent should be able to detect |thresholds|+1 levels of odor. |
3e-6
|
actions |
dict or ndarray
|
The set of action available to the agent. It should match the type of environment (ie: if the environment has layers, it should contain a layer component to the action vector, and similarly for a third dimension). Else, a dict of strings and action vectors where the strings represent the action labels. If none is provided, by default, all unit movement vectors are included and shuch for all layers (if the environment has layers.) |
None
|
name |
str
|
A custom name to give the agent. If not provided is will be a combination of the class-name and the threshold. |
None
|
seed |
int
|
For reproducible randomness. |
12131415
|
model |
Model
|
A POMDP model to use to represent the olfactory environment. If not provided, the environment_converter parameter will be used. |
None
|
environment_converter |
Callable
|
A function to convert the olfactory environment instance to a POMDP Model instance. By default, we use an exact convertion that keeps the shape of the environment to make the amount of states of the POMDP Model. This parameter will be ignored if the model parameter is provided. |
exact_converter
|
converter_parameters |
dict
|
A set of additional parameters to be passed down to the environment converter. |
{}
|
Attributes:
Name | Type | Description |
---|---|---|
environment |
Environment
|
|
threshold |
float or list[float]
|
|
name |
str
|
|
action_set |
ndarray
|
The actions allowed of the agent. Formulated as movement vectors as [(layer,) (dz,) dy, dx]. |
action_labels |
list[str]
|
The labels associated to the action vectors present in the action set. |
model |
Model
|
The environment converted to a POMDP model using the "from_environment" constructor of the pomdp.Model class. |
saved_at |
str
|
The place on disk where the agent has been saved (None if not saved yet). |
on_gpu |
bool
|
Whether the agent has been sent to the gpu or not. |
class_name |
str
|
The name of the class of the agent. |
seed |
int
|
The seed used for the random operations (to allow for reproducability). |
rnd_state |
RandomState
|
The random state variable used to generate random values. |
trained_at |
str
|
A string timestamp of when the agent has been trained (None if not trained yet). |
value_function |
ValueFunction
|
The value function used for the agent to make decisions. |
belief |
BeliefSet
|
Used only during simulations. Part of the Agent's status. Where the agent believes he is over the state space. It is a list of n belief points based on how many simulations are running at once. |
action_played |
list[int]
|
Used only during simulations. Part of the Agent's status. Records what action was last played by the agent. A list of n actions played based on how many simulations are running at once. |
Source code in olfactory_navigation/agents/qmdp_agent.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
train(expansions, initial_value_function=None, gamma=0.99, eps=1e-06, use_gpu=False, history_tracking_level=1, overwrite_training=False, print_progress=True, print_stats=True)
Simplified version of the training. It consists in running the Value Iteration process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expansions |
int
|
How many iterations to run the Value Iteration process for. |
required |
initial_value_function |
ValueFunction
|
An initial value function to start the solving process with. |
None
|
use_gpu |
bool
|
Whether to use the GPU with cupy array to accelerate solving. |
False
|
gamma |
float
|
The discount factor to value immediate rewards more than long term rewards. The learning rate is 1/gamma. |
0.99
|
eps |
float
|
The smallest allowed changed for the value function. Bellow the amound of change, the value function is considered converged and the value iteration process will end early. |
1e-6
|
history_tracking_level |
int
|
How thorough the tracking of the solving process should be. (0: Nothing; 1: Times and sizes of belief sets and value function; 2: The actual value functions and beliefs sets) |
1
|
overwrite_training |
bool
|
Whether to force the overwriting of the training if a value function already exists for this agent. |
False
|
print_progress |
bool
|
Whether or not to print out the progress of the value iteration process. |
True
|
print_stats |
bool
|
Whether or not to print out statistics at the end of the training run. |
True
|
Returns:
Name | Type | Description |
---|---|---|
solver_history |
SolverHistory
|
The history of the solving process with some plotting options. |
Source code in olfactory_navigation/agents/qmdp_agent.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|