What tests are available?

In this article, the various test setup that have been built to facilitate the testing of different performance aspects of an agent. All the tests that have been built upon the run_test() function which has been described in more detail in the "How does a simulation work?" tutorial. All these tests are part of the test_setups module.

These tests include:

All start test
N start per cell
Shape robustness test
Scale robustness test

All start test

Function: run_all_starts_test() _reference

The "all starts" test allows to quick test all possible start positions of the agent in space. Since the environment has a start_probabilities attribute, we can use it to check all the positions where the probability is higher than zero. With this test, we can easily test if there are areas in space where the agent performs better than other areas.

For example, in a 2-dimensional environment, we can plot the start positions that lead to a successful trajectory in blue and the rest in red:

All start positions successes/failures.

N start per cell

Function: run_n_by_cell_test() _reference

In the case where the environment is too detailed and therefore would require too many points to test the whole starting area, we can divide the space in "cells" in which we same a number of points n at random to test the performances averaged over these cells. Another scenario where this test can become useful is also when we want a more statistically significant amount of trajectories to draw conclusions about the performance of an agent.

For example, in a 2-dimensional environment, we can divide the space in a 3 by 26 grid of cells and draw 20 random starting points in each cells:

20 random starting point per cells in a 2-dimensional environment.

And over each cell we can average the amount of steps required to reach the source leading to the following heatmap:

Average over the cells of the "extra steps" metric.

Shape robustness test

Function: test_shape_robustness() _reference

This test is meant test the robustness of an agent to changes in the olfactory plume shape compared to the one it has been trained with. This test, by default, modifies the shape in the x and y direction by increments of 20%, from 20% the size to 200% the size. For each modified environment an "All start test" is run to evaluate the performance. This test compared to the ones defined previously return a list of SimulationHistory instances rather than a single one.

When the test is done, the list of SimulationHistory instances can be analyzed with the analyse_shape_robustness() function. Following which a heat-map of the performances can be plot for each modifications.

Shape robustness test, convergence performance heat-map.

Scale robustness test

Function: test_scale_robustness() _reference

This test works a bit in a similar fashion as the "Shape robustness test", as in, it modifies the environment and runs simulations with the "All start test" for a series of values. But in this scenario, instead of modifying the shape of the olfactory plume, we analyse the sampling of the olfactory navigation environment. We imagine the agent has been trained on a n by n environment, we then test its performance on a 10n by 10n environment and n/10 by n/10 environemnt. These environments therefore more or less granularity and detail while maintaining the same olfactory probabilities.

By default, as for the previous test, the environment scale is modified by increments of 20% from 20% scale to 200% scale. And, after analyzing the list of SimulationHistory instance using the analyse_scale_robustness() function, we can plot the results as follows:

Scale robustness test, convergence performance.