Configuration
The flakefighters plugin implements several cutting edge flaky test detection tools from the research community.
Each one is individually configurable and can be run individually or with other flakefighters.
By default, the plugin will only use the DeFlaker algorithm to classify flaky tests.
You can control which flakefighters to run and provide additional configuration options from your pyproject.toml file by including sections of the following form for each flakefighter you want to run.
Here, <FlakeFighterClass> is the class of the flakefighter you wish to configure as if you were going to import it into a source code file.
[tool.pytest.ini_options.pytest_flakefighters.flakefighters.deflaker.DeFlaker]
run_live=true # run the classifier immediately after each test
active=false # turn off the flakefighter (use active=true, or leave unspecified to turn it on)
[tool.pytest.ini_options.pytest_flakefighters.flakefighters.traceback_matching.TracebackMatching]
run_live=false # run the classifier at the end of the test suite
[tool.pytest.ini_options.pytest_flakefighters.flakefighters.traceback_matching.CosineSimilarity]
run_live=false # run the classifier at the end of the test suite
threshold=0.8 # Cosine similarity >= 0.8 is classed as a match
[tool.pytest.ini_options.pytest_flakefighters.flakefighters.coverage_independence.CoverageIndependence]
run_live=false # run the classifier at the end of the test suite
threshold=0.1 # Distance <= 0.1 is classed as "similar"
metric=hamming # Use Hamming distance
linkage_method=complete # Use complete linkage for clustering
Note
The above configuration is just an example meant to demonstrate the various parameters that can be supplied, and is not a recommendation or “default”.
You should choose the parameter values that are appropriate for your project, especially threshold values for CosineSimilarity and CoverageIndependence.
The default behaviour of the plugin is to run every flakefigher with a specified configuration.
However, there are two ways to toggle flakefighters on and off.
Firstly, you can add active=false to the configuration, as for DeFlaker in the above example configuration.
Secondly, you can use the --active-flakefighters commandline argument, e.g. --active-flakefighters DeFlaker CosineSimilarity would run just the DeFlaker and CosineSimilarity flakefighers.
Note that the commandline argument overides the value of active specified in the configuration file.
Every flakefighter has a run_live option, which can be set to true to classify each test execution as flaky immediately after it is run, or false to clasify all tests at once at the end, although individual flakefighters may only support one of these.
Individual flakefighters have their own configurable options.
These are detailed below.
- class pytest_flakefighters.flakefighters.coverage_independence.CoverageIndependence(threshold: float = 0, metric: str = 'jaccard', linkage_method='single')
Classify tests as flaky if they fail independently of passing test cases that exercise overlapping code.
Note
To use this flakefighter, you will need to install scipy. We do not include this by default as it can be problematic to install on Windows. You can do this by running
pip install pytest-flakefighers[scipy]orpip install scipyfrom within your virtual environment.- Variables:
run_live – Run detection “live” after each test. Otherwise run as a postprocessing step after the test suite. This is always False as live classification is not supported.
threshold – The minimum distance to consider as “similar”, expressed as a proportion 0 <= threshold < 1 where 0 represents no difference and 1 represents complete difference.
metric – The distance metric to use. For a full list of valid values,see the SciPy documentation for spatial distances.
linkage_method – From scipy.cluster.hierarchy.linkage: [‘single’, ‘complete’, ‘average’, ‘weighted’, ‘centroid’, ‘median’, ‘ward’]
- class pytest_flakefighters.flakefighters.deflaker.DeFlaker(run_live: bool, root: str = '.', source_commit: str | None = None, target_commit: str | None = None)
A python equivalent of the DeFlaker algorithm from Bell et al. (2019). Given the subtle differences between JUnit and pytest, this is not intended to be an exact port, but it follows the same general methodology of checking whether covered code has been changed between commits.
- Variables:
run_live – Run detection “live” after each test. Otherwise run as a postprocessing step after the test suite.
root – The root directory of the Git repository.
source_commit – The source (older) commit hash. Defaults to HEAD^ (the previous commit to target).
target_commit – The target (newer) commit hash. Defaults to HEAD (the most recent commit).
- class pytest_flakefighters.flakefighters.traceback_matching.TracebackMatching(run_live: bool, previous_runs: list[Run], root: str = '.')
Simple text-based matching classifier from Section II.A of Alshammari et al. (2024). We implement text-based matching on the failure logs for each test. Each failure log is represented by its failure exception and stacktrace.
- Variables:
run_live – Run detection “live” after each test. Otherwise run as a postprocessing step after the test suite.
- class pytest_flakefighters.flakefighters.traceback_matching.CosineSimilarity(run_live: bool, previous_runs: list[Run], root: str = '.', threshold: float = 1)
TF-IDF cosine similarity matching classifier from Section II.C of Alshammari et al. (2024). Test executions are classified as flaky if the stack trace is sufficiently similar to a previous flaky execution.
- Variables:
run_live – Run detection “live” after each test. Otherwise run as a postprocessing step after the test suite.
previous_runs – List of previous FlakeFighters runs.
root – The root directory of the code repository.
threshold – The minimum distance to consider as “similar”, expressed as a proportion 0 <= threshold < 1 where 0 represents no difference and 1 represents complete difference.