Extending the library

A key advantage of the subgroups python library is that it is easily extensible. Therefore, users can add new quality measures, data structures and algorithms.

After adding new functionality to the library, it is required to implement its corresponding tests in the tests folder in order to verify that this functionality is well-implemented and works properly.

Adding a new quality measure

This example use the WRAcc quality measure to show how to add a new quality measure to the library.

The first step is to create a python file in the quality_measures folder whose name is the name of the specific quality measure to implement, wracc.py in this case. Note that the file name is always in lowercase.

Then, the file content must be the following:

 1"""This file contains the implementation of the Weighted Relative Accuracy (WRAcc) quality measure.
 2"""
 3
 4from subgroups.quality_measures.quality_measure import QualityMeasure
 5from subgroups.exceptions import SubgroupParameterNotFoundError
 6
 7# Python annotations.
 8from typing import Union
 9
10class WRAcc(QualityMeasure):
11    """This class defines the Weighted Relative Accuracy (WRAcc) quality measure.
12    """
13
14    _singleton = None
15    __slots__ = ()
16
17    def __new__(cls) -> 'WRAcc':
18        if WRAcc._singleton is None:
19            WRAcc._singleton = object().__new__(cls)
20        return WRAcc._singleton
21
22    def compute(self, dict_of_parameters : dict[str, Union[int, float]]) -> float:
23        """Method to compute the WRAcc quality measure (you can also call to the instance for this purpose).
24
25        :param dict_of_parameters: python dictionary which contains all the necessary parameters used to compute this quality measure.
26        :return: the computed value for the WRAcc quality measure.
27        """
28        if type(dict_of_parameters) is not dict:
29            raise TypeError("The type of the parameter 'dict_of_parameters' must be 'dict'.")
30        if (QualityMeasure.TRUE_POSITIVES not in dict_of_parameters):
31            raise SubgroupParameterNotFoundError("The subgroup parameter 'tp' is not in 'dict_of_parameters'.")
32        if (QualityMeasure.FALSE_POSITIVES not in dict_of_parameters):
33            raise SubgroupParameterNotFoundError("The subgroup parameter 'fp' is not in 'dict_of_parameters'.")
34        if (QualityMeasure.TRUE_POPULATION not in dict_of_parameters):
35            raise SubgroupParameterNotFoundError("The subgroup parameter 'TP' is not in 'dict_of_parameters'.")
36        if (QualityMeasure.FALSE_POPULATION not in dict_of_parameters):
37            raise SubgroupParameterNotFoundError("The subgroup parameter 'FP' is not in 'dict_of_parameters'.")
38        tp = dict_of_parameters[QualityMeasure.TRUE_POSITIVES]
39        fp = dict_of_parameters[QualityMeasure.FALSE_POSITIVES]
40        TP = dict_of_parameters[QualityMeasure.TRUE_POPULATION]
41        FP = dict_of_parameters[QualityMeasure.FALSE_POPULATION]
42        return ( (tp+fp) / (TP+FP) ) * ( ( tp / (tp+fp) ) - ( TP / (TP+FP) ) )
43
44    def get_name(self) -> str:
45        """Method to get the quality measure name (equal to the class name).
46        """
47        return "WRAcc"
48
49    def optimistic_estimate_of(self) -> dict[str, QualityMeasure]:
50        """Method to get a python dictionary with the quality measures of which this one is an optimistic estimate.
51
52        :return: a python dictionary in which the keys are the quality measure names and the values are the instances of those quality measures.
53        """
54        return dict()
55
56    def __call__(self, dict_of_parameters : dict[str, Union[int, float]]) -> float:
57        """Compute the WRAcc quality measure.
58
59        :param dict_of_parameters: python dictionary which contains all the needed parameters with which to compute this quality measure.
60        :return: the computed value for the WRAcc quality measure.
61        """
62        return self.compute(dict_of_parameters)

This file contains only one class, which inherits from the QualityMeasure abstract class and whose name is the name of the specific quality measure to implement, WRAcc in this case. Since this class is a singleton, it contains a class attribute called _singleton and the __new__ method as indicated in the previous code. At the same time, this class also overwrite the compute, get_name, optimistic_estimate_of and __call__ methods.

After that, the last step is to add the following line in the quality_measures/__init__.py file:

from subgroups.quality_measures.wracc import WRAcc

Adding a new data structure

The first step is to create a python file in the data_structures folder whose name is the name of the specific data structure to implement. Remember that the file name is always in lowercase. The only implementation restriction for this file is to have only one class.

After that, using as an example the subgroup list data structure, the last step is to add the following line in the data_structures/__init__.py file:

from subgroups.data_structures.subgroup_list import SubgroupList

Adding a new algorithm

This example use the VLSD algorithm to show how to add a new algorithm to the library.

The first step is to create a python file either in the algorithms/subgroup_sets folder or in the algorithms/subgroup_lists folder depending on the algorithm type to implement. The file name is the name of the specific algorithm to implement, vlsd.py in this case. Note that the file name is always in lowercase.

This file contains only one class, which inherits from the Algorithm abstract class and whose name is the name of the specific algorithm to implement, VLSD in this case. At the same time, this class overwrites the fit method, whose definition is as follows:

def fit(self, pandas_dataframe : DataFrame, target : tuple[str, str]) -> None:

After that, the last step is to add the following line in the algorithms/__init__.py file:

from subgroups.algorithms.subgroup_sets.vlsd import VLSD