subgroups.algorithms.subgroup_lists package

Submodules

subgroups.algorithms.subgroup_lists.dslm module

This file contains the implementation of the DSLM algorithm.

class subgroups.algorithms.subgroup_lists.dslm.DSLM(input_file_path, max_sl, sl_max_size, beta, maximum_positive_overlap, maximum_negative_overlap, output_file_path)[source]

Bases: PSLD

This class represents the DSLM algorithm.

Parameters:
  • input_file_path (str) – path of the file from which the subgroups and their bitarrays will be read.

  • max_sl (int) – maximum number of subgroups lists to generate.

  • sl_max_size (int) – maximum number of subgroups that each subgroup list will contain.

  • beta (float) – level of normalization of the compression gain.

  • maximum_positive_overlap (float) – maximum positive overlap factor permitted to add a subgroup candidate to the subgroup list (i.e., a subgroup candidate will be added to the subgroup list only if its positive overlap factor is less or equal than maximum_positive_overlap). Values close to 0 are stricter and allow candidates with less overlap, while values close to 1 allow candidates with more overlap.

  • maximum_negative_overlap (float) – maximum negative overlap factor permitted to add a subgroup candidate to the subgroup list (i.e., a subgroup candidate will be added to the subgroup list only if its negative overlap factor is less or equal than maximum_negative_overlap). Values close to 0 are stricter and allow candidates with less overlap, while values close to 1 allow candidates with more overlap.

  • output_file_path (str) – path of the file in which the results will be written.

fit(pandas_dataframe, target)[source]

Main method to run the DSLM algorithm. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.

Parameters:
  • pandas_dataframe (pandas.core.frame.DataFrame) – the DataFrame which is scanned. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.

  • target (tuple[str, str]) – a tuple with 2 elements: the target attribute name and the target value.

Return type:

None

property maximum_negative_overlap: float

Maximum negative overlap factor permitted to add a subgroup candidate to the subgroup list (i.e., a subgroup candidate will be added to the subgroup list only if its negative overlap factor is less or equal than maximum_negative_overlap). Values close to 0 are stricter and allow candidates with less overlap, while values close to 1 allow candidates with more overlap.

property maximum_positive_overlap: float

Maximum positive overlap factor permitted to add a subgroup candidate to the subgroup list (i.e., a subgroup candidate will be added to the subgroup list only if its positive overlap factor is less or equal than maximum_positive_overlap). Values close to 0 are stricter and allow candidates with less overlap, while values close to 1 allow candidates with more overlap.

subgroups.algorithms.subgroup_lists.gmsl module

This file contains the implementation of the GMSL algorithm.

class subgroups.algorithms.subgroup_lists.gmsl.GMSL(input_file_path, max_sl, beta, output_file_path)[source]

Bases: Algorithm

This class represents the GMSL algorithm.

Parameters:
  • input_file_path (str) – path of the file from which the subgroups and their bitarrays will be read.

  • max_sl (int) – maximum number of subgroups lists to generate.

  • beta (float) – level of normalization of the compression gain.

  • output_file_path (str) – path of the file in which the results will be written.

INPUT_LINE_REGEX_PATTERN: typing.ClassVar[str] = "^(?P<subgroup>Description: \\[[&,\\.<>/=A-Za-z0-9_-]+ = ([&,\\.<>/=A-Za-z0-9_-]+|'[&,\\.<>/=A-Za-z0-9_-]+')(, [&,\\.<>/=A-Za-z0-9_-]+ = ([&,\\.<>/=A-Za-z0-9_-]+|'[&,\\.<>/=A-Za-z0-9_-]+'))*\\], Target: [&,\\.<>/=A-Za-z0-9_-]+ = ([&,\\.<>/=A-Za-z0-9_-]+|'[&,\\.<>/=A-Za-z0-9_-]+')) ; (?P<positive_bitarray>[01]+) ; (?P<negative_bitarray>[01]+)$"
property beta: int | float

Level of normalization of the compression gain.

fit(pandas_dataframe, target)[source]

Main method to run the GMSL algorithm. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.

Parameters:
  • pandas_dataframe (pandas.core.frame.DataFrame) – the DataFrame which is scanned. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.

  • target (tuple[str, str]) – a tuple with 2 elements: the target attribute name and the target value.

Return type:

None

property input_file_path: str

Path of the file from which the subgroups and their bitarrays will be read.

property max_sl: int

Maximum number of subgroups lists to generate.

property output_file_path: str

Path of the file in which the results will be written.

subgroups.algorithms.subgroup_lists.psld module

This file contains the implementation of the PSLD algorithm.

class subgroups.algorithms.subgroup_lists.psld.PSLD(input_file_path, max_sl, sl_max_size, beta, output_file_path)[source]

Bases: GMSL

This class represents the PSLD algorithm.

Parameters:
  • input_file_path (str) – path of the file from which the subgroups and their bitarrays will be read.

  • max_sl (int) – maximum number of subgroups lists to generate.

  • sl_max_size (int) – maximum number of subgroups that each subgroup list will contain.

  • beta (float) – level of normalization of the compression gain.

  • output_file_path (str) – path of the file in which the results will be written.

fit(pandas_dataframe, target)[source]

Main method to run the PSLD algorithm. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.

Parameters:
  • pandas_dataframe (pandas.core.frame.DataFrame) – the DataFrame which is scanned. This algorithm only supports nominal attributes (i.e., type ‘str’). IMPORTANT: missing values are not supported.

  • target (tuple[str, str]) – a tuple with 2 elements: the target attribute name and the target value.

Return type:

None

property sl_max_size: int

Maximum number of subgroups that each subgroup list will contain.