subgroups.utils package
Submodules
subgroups.utils.dataframe_filters module
This file contains the implementation of different functions used to filter a pandas DataFrame according to certain criteria.
- subgroups.utils.dataframe_filters.filter_by_list_of_selectors(pandas_dataframe, list_of_selectors)[source]
Method to filter a pandas DataFrame, retrieving only the rows covered by all selectors included in the parameter ‘list_of_selectors’. IMPORTANT: If an attribute name of a selector of the pattern is not in the pandas.DataFrame passed by parameter, a KeyError exception is raised.
- Parameters:
pandas_dataframe (
pandas.core.frame.DataFrame
) – the DataFrame which is filtered.list_of_selectors (
list
[subgroups.core.selector.Selector
]) – the list of selectors used in the filtering process. IMPORTANT: we assume that the parameter ‘list_of_selectors’ only contains selectors.
- Return type:
pandas.core.frame.DataFrame
- Returns:
the pandas DataFrame obtained after the filtering process.
subgroups.utils.file_format_transformations module
This file contains the implementation of different functions used to transform the resulting files obtained by the algorithms.
- subgroups.utils.file_format_transformations.to_input_format_for_subgroup_list_algorithms(original_file_path, transformed_file_path)[source]
Method to transform the format of a file generated by a traditional SD algorithm (that mines a subgroup set) to the the input file format of the algorithms that mine subgroup lists.
- Parameters:
original_file_path (
str
) – path of the original file.transformed_file_path (
str
) – path of the transformed file.
- Return type:
tuple
[int
,int
]- Returns:
a 2-tuple of the form: (number of subgroups correctly read, number of subgroups not correctly read).
subgroups.utils.mdl module
This file contains the implementation of different functions used by the MDL principle.
- subgroups.utils.mdl.log2_multinomial_with_recurrence(number_of_categories, number_of_samples)[source]
Compute the logarithm to base 2 of the multinomial distribution complexity.
- Parameters:
number_of_categories (
int
) – number of categories of the multinomial distribution.number_of_samples (
int
) – number of instances/points/samples/rows/registers.
- Return type:
float
- Returns:
the logarithm to base 2 of the multinomial distribution complexity or 0 if the multinomial distribution complexity is 0.
- subgroups.utils.mdl.multinomial_with_recurrence(number_of_categories, number_of_samples)[source]
Compute the multinomial distribution complexity.
- Parameters:
number_of_categories (
int
) – number of categories of the multinomial distribution.number_of_samples (
int
) – number of instances/points/samples/rows/registers.
- Return type:
float
- Returns:
the multinomial distribution complexity.
- subgroups.utils.mdl.universal_code_for_integer(input_integer_value)[source]
Compute the universal code LN(i) for the input integer value.
- Parameters:
input_integer_value (
int
) – integer value on which to compute the universal code.- Return type:
float
- Returns:
the universal code LN(i) for the input integer value.
- subgroups.utils.mdl.universal_code_for_integer_with_maximum(input_integer_value, maximum_integer_value)[source]
Compute the universal code LN(i) for the input integer value, when a maximum integer value exists.
- Parameters:
input_integer_value (
int
) – integer value on which to compute the universal code.maximum_integer_value (
int
) – maximum integer value existing.
- Return type:
float
- Returns:
the universal code LN(i) for the input integer value.