subgroups.algorithms.subgroup_lists package
Submodules
subgroups.algorithms.subgroup_lists.dslm module
This file contains the implementation of the DSLM algorithm.
- class subgroups.algorithms.subgroup_lists.dslm.DSLM(input_file_path, max_sl, sl_max_size, beta, maximum_positive_overlap, maximum_negative_overlap, output_file_path)[source]
Bases:
PSLD
This class represents the DSLM algorithm.
- Parameters:
input_file_path (
str
) – path of the file from which the subgroups and their bitarrays will be read.max_sl (
int
) – maximum number of subgroups lists to generate.sl_max_size (
int
) – maximum number of subgroups that each subgroup list will contain.beta (
float
) – level of normalization of the compression gain.maximum_positive_overlap (
float
) – maximum positive overlap factor permitted to add a subgroup candidate to the subgroup list (i.e., a subgroup candidate will be added to the subgroup list only if its positive overlap factor is less or equal than maximum_positive_overlap). Values close to 0 are stricter and allow candidates with less overlap, while values close to 1 allow candidates with more overlap.maximum_negative_overlap (
float
) – maximum negative overlap factor permitted to add a subgroup candidate to the subgroup list (i.e., a subgroup candidate will be added to the subgroup list only if its negative overlap factor is less or equal than maximum_negative_overlap). Values close to 0 are stricter and allow candidates with less overlap, while values close to 1 allow candidates with more overlap.output_file_path (
str
) – path of the file in which the results will be written.
- fit(pandas_dataframe, target)[source]
Main method to run the DSLM algorithm. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.
- Parameters:
pandas_dataframe (
pandas.core.frame.DataFrame
) – the DataFrame which is scanned. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.target (
tuple
[str
,str
]) – a tuple with 2 elements: the target attribute name and the target value.
- Return type:
None
- property maximum_negative_overlap: float
Maximum negative overlap factor permitted to add a subgroup candidate to the subgroup list (i.e., a subgroup candidate will be added to the subgroup list only if its negative overlap factor is less or equal than maximum_negative_overlap). Values close to 0 are stricter and allow candidates with less overlap, while values close to 1 allow candidates with more overlap.
- property maximum_positive_overlap: float
Maximum positive overlap factor permitted to add a subgroup candidate to the subgroup list (i.e., a subgroup candidate will be added to the subgroup list only if its positive overlap factor is less or equal than maximum_positive_overlap). Values close to 0 are stricter and allow candidates with less overlap, while values close to 1 allow candidates with more overlap.
subgroups.algorithms.subgroup_lists.gmsl module
This file contains the implementation of the GMSL algorithm.
- class subgroups.algorithms.subgroup_lists.gmsl.GMSL(input_file_path, max_sl, beta, output_file_path)[source]
Bases:
Algorithm
This class represents the GMSL algorithm.
- Parameters:
input_file_path (
str
) – path of the file from which the subgroups and their bitarrays will be read.max_sl (
int
) – maximum number of subgroups lists to generate.beta (
float
) – level of normalization of the compression gain.output_file_path (
str
) – path of the file in which the results will be written.
-
INPUT_LINE_REGEX_PATTERN:
typing.ClassVar
[str
] = "^(?P<subgroup>Description: \\[[&,\\.<>/=A-Za-z0-9_-]+ = ([&,\\.<>/=A-Za-z0-9_-]+|'[&,\\.<>/=A-Za-z0-9_-]+')(, [&,\\.<>/=A-Za-z0-9_-]+ = ([&,\\.<>/=A-Za-z0-9_-]+|'[&,\\.<>/=A-Za-z0-9_-]+'))*\\], Target: [&,\\.<>/=A-Za-z0-9_-]+ = ([&,\\.<>/=A-Za-z0-9_-]+|'[&,\\.<>/=A-Za-z0-9_-]+')) ; (?P<positive_bitarray>[01]+) ; (?P<negative_bitarray>[01]+)$"
- property beta: int | float
Level of normalization of the compression gain.
- fit(pandas_dataframe, target)[source]
Main method to run the GMSL algorithm. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.
- Parameters:
pandas_dataframe (
pandas.core.frame.DataFrame
) – the DataFrame which is scanned. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.target (
tuple
[str
,str
]) – a tuple with 2 elements: the target attribute name and the target value.
- Return type:
None
- property input_file_path: str
Path of the file from which the subgroups and their bitarrays will be read.
- property max_sl: int
Maximum number of subgroups lists to generate.
- property output_file_path: str
Path of the file in which the results will be written.
subgroups.algorithms.subgroup_lists.psld module
This file contains the implementation of the PSLD algorithm.
- class subgroups.algorithms.subgroup_lists.psld.PSLD(input_file_path, max_sl, sl_max_size, beta, output_file_path)[source]
Bases:
GMSL
This class represents the PSLD algorithm.
- Parameters:
input_file_path (
str
) – path of the file from which the subgroups and their bitarrays will be read.max_sl (
int
) – maximum number of subgroups lists to generate.sl_max_size (
int
) – maximum number of subgroups that each subgroup list will contain.beta (
float
) – level of normalization of the compression gain.output_file_path (
str
) – path of the file in which the results will be written.
- fit(pandas_dataframe, target)[source]
Main method to run the PSLD algorithm. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.
- Parameters:
pandas_dataframe (
pandas.core.frame.DataFrame
) – the DataFrame which is scanned. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.target (
tuple
[str
,str
]) – a tuple with 2 elements: the target attribute name and the target value.
- Return type:
None
- property sl_max_size: int
Maximum number of subgroups that each subgroup list will contain.