subgroups.data_structures package

Submodules

subgroups.data_structures.bitset_bsd module

This file contains the implementation of the Bitset data structure used in the BSD algorithm and its variants.

class subgroups.data_structures.bitset_bsd.BitsetBSD[source]

Bases: object

This class represents a bitset used in the BSD algorithm and its variants.

property bitset_neg: dict: The bitset dictionary for rows that do not match the target value.

property bitset_pos: dict: The bitset dictionary for rows that match the target value.

build_bitset(pandas_dataframe, set_of_frequent_selectors, tuple_target_attribute_value)[source]

Method to build the complete tree from the root node using a set of frequent selectors.

Parameters:

pandas_dataframe (pandas.core.frame.DataFrame) – Input dataset. It is VERY IMPORTANT to respect the following conditions: (1) the dataset must be a pandas dataframe, (2) the dataset must not contain missing values, (3) for each attribute, all its values must be of the same type.
set_of_frequent_selectors (list) – The set of frequent selectors (L) to use in the building of the tree.
tuple_target_attribute_value (tuple) – Tuple with the name of the target attribute (first element) and with the value of this attribute (second element). EXAMPLE1: (“age”, 25). EXAMPLE2: (“class”, “Setosa”). It is VERY IMPORTANT to respect the following conditions: (1) the name of the target attribute MUST be a string, (2) the name of the target attribute MUST exist in the dataset, (3) it is VERY IMPORTANT to respect the types of the attributes: the value in the tuple (second element) MUST BE comparable with the values of the corresponding attribute in the dataset, (4) the value of the target attribute MUST exist in the dataset.

Return type:

None

generate_set_of_frequent_selectors(pandas_dataframe, tuple_target_attribute_value, min_support)[source]

Method to scan the dataset (ONLY DISCRETE/NOMINAL ATTRIBUTES) and collect the sorted set of frequent selectors (L).

Parameters:

pandas_dataframe (pandas.DataFrame) – Input dataset. It is VERY IMPORTANT to respect the following conditions: (1) the dataset must be a pandas dataframe, (2) the dataset must not contain missing values, (3) for each attribute, all its values must be of the same type.
tuple_target_attribute_value (tuple) – Tuple with the name of the target attribute (first element) and with the value of this attribute (second element). EXAMPLE1: (“age”, 25). EXAMPLE2: (“class”, “Setosa”). It is VERY IMPORTANT to respect the following conditions: (1) the name of the target attribute MUST be a string, (2) the name of the target attribute MUST exist in the dataset, (3) it is VERY IMPORTANT to respect the types of the attributes: the value in the tuple (second element) MUST BE comparable with the values of the corresponding attribute in the dataset, (4) the value of the target attribute MUST exist in the dataset.
min_support (int) – Minimum support threshold (NUMBER OF TIMES, NOT A PROPORTION).

Return type:

list

Returns:

the sorted set of frequent selectors (L) as a list.

class subgroups.data_structures.bitset_bsd.BitsetDictionary[source]

Bases: dict

Internal class to implement the dicttionaries used in the bitset. This dictionary only allows to insert a Pattern or a Selector as key. If a Selector is inserted, it is converted to a Pattern. Each entry must store a bitarray.

subgroups.data_structures.bitset_qfinder module

This file contains the implementation of the Bitset data structure used in the QFinder to create the regression models.

class subgroups.data_structures.bitset_qfinder.Bitset_QFinder[source]

Bases: object

This class represents a bitset used in the QFinder algorithm.

compute_credibility_measures(target_column)[source]

Method to compute the credibility measures for each candidate pattern.

Parameters:: target_column – target column of the dataset.
Return type:: pandas.core.frame.DataFrame
Returns:: a pandas DataFrame with the credibility values for each candidate pattern.

generate_bitset(df, tuple_target_attribute_value, list_of_candidate_patterns)[source]

This method generates a bitset from a dataset and a list of candidate patterns. Each column of the bitset represents a candidate pattern and each row represents an instance of the dataset. The value of each cell is True if the corresponding pattern appears in the corresponding instance and False otherwise.

Parameters:

df (pandas.core.frame.DataFrame) – dataset from which the bitset is generated.
tuple_target_attribute_value (tuple) – tuple which contains the name of the target attribute and its value.
list_of_candidate_patterns (list[subgroups.core.pattern.Pattern]) – list of candidate patterns.

Return type:

None

get_non_empty_patterns()[source]

Method to get the candidate patterns after removing those that do not appear in the dataset.

Return type:: list[subgroups.core.pattern.Pattern]

subgroups.data_structures.fp_tree_for_sdmap module

This file contains the implementation of the FPTree data structure used in the SDMap algorithm.

class subgroups.data_structures.fp_tree_for_sdmap.FPTreeForSDMap[source]

Bases: object

This class represents the FPTree data structure used in the SDMap algorithm.

build_tree(pandas_dataframe, set_of_frequent_selectors, target)[source]

Method to build the complete FPTree from a pandas DataFrame and using the set of frequent selectors. IMPORTANT: missing values are not supported yet.

Parameters:

pandas_dataframe (pandas.core.frame.DataFrame) – the DataFrame which is scanned. IMPORTANT: missing values are not supported yet.
set_of_frequent_selectors (dict[str, tuple[subgroups.core.selector.Selector, list[int], int]]) – the set of frequent selectors generated by the method ‘generate_set_of_frequent_selectors’.
target (tuple[str, typing.Union[int, float, str]]) – a tuple with 2 elements: the target attribute name and the target value.

Return type:

None

generate_conditional_fp_tree(list_of_selectors, minimum_tp=None, minimum_fp=None, minimum_n=None)[source]

Method to get the conditional FPTree with a list of selectors. Two threshold types could be used: (1) the true positives tp and the false positives fp separately or (2) the subgroup description size n (n = tp + fp). This means that: (1) if ‘minimum_tp’ and ‘minimum_fp’ have a value of type ‘int’, ‘minimum_n’ must be None; and (2) if ‘minimum_n’ has a value of type ‘int’, ‘minimum_tp’ and ‘minimum_fp’ must be None.

Parameters:

list_of_selectors (list[subgroups.core.selector.Selector]) – the list of selectors which is used. IMPORTANT: we assume that the list of selectors only contains selectors.
minimum_tp (typing.Optional[int]) – the minimum true positives (tp) threshold.
minimum_fp (typing.Optional[int]) – the minimum false positives (fp) threshold.
minimum_n (typing.Optional[int]) – the minimum subgroup description size (n) threshold.

Return type:

subgroups.data_structures.fp_tree_for_sdmap.FPTreeForSDMap

Returns:

the generated conditional FPTree.

generate_set_of_frequent_selectors(pandas_dataframe, target, minimum_tp=None, minimum_fp=None, minimum_n=None)[source]

Method to scan the pandas DataFrame in order to generate the set of frequent selectors. Two threshold types could be used: (1) the true positives tp and the false positives fp separately or (2) the subgroup description size n (n = tp + fp). This means that: (1) if ‘minimum_tp’ and ‘minimum_fp’ have a value of type ‘int’, ‘minimum_n’ must be None; and (2) if ‘minimum_n’ has a value of type ‘int’, ‘minimum_tp’ and ‘minimum_fp’ must be None. IMPORTANT: missing values are not supported yet.

Parameters:

pandas_dataframe (pandas.core.frame.DataFrame) – the DataFrame which is scanned. IMPORTANT: missing values are not supported yet.
target (tuple[str, typing.Union[int, float, str]]) – a tuple with 2 elements: the target attribute name and the target value.
minimum_tp (typing.Optional[int]) – the minimum true positives (tp) threshold.
minimum_fp (typing.Optional[int]) – the minimum false positives (fp) threshold.
minimum_n (typing.Optional[int]) – the minimum subgroup description size (n) threshold.

Return type:

dict[str, tuple[subgroups.core.selector.Selector, list[int], int]]

Returns:

a dictionary in which the keys are strings (the concatenation of the selector attribute name and the selector value) and the values are tuples with 3 elements: (1) the selector, (2) a list with 2 elements: the true positives tp of it and the false positives fp of it, and (3) a number indicating the insertion order in this dictionary (starting from 0).

property header_table: dict[Selector, list[object]]: The header table.

header_table_as_str(follow_node_links=True)[source]

Method to print all the entries of the FPTree header table.

Parameters:: follow_node_links (bool) – whether print all the FPTreeNode ids in the horizontal list or only the first one. By default, True.
Return type:: str
Returns:: the printed header table.

is_empty()[source]

Method to check whether the FPTree only has the root node.

Return type:: bool
Returns:: whether the FPTree only has the root node.

property root_node: FPTreeNode: The root of the tree.

property sorted_header_table: list: A list with the selectors of the header table sorted according to the summation of the ‘n’ (summation of the true positives tp + summation of the false positives fp).

there_is_a_single_path()[source]

Method to check whether all internal nodes only have 1 child.

Return type:: bool
Returns:: whether all internal nodes only have 1 child.

tree_as_str()[source]

Method to print as str the complete FPTree from the root node.

Return type:: str
Returns:: the printed FPTree.

subgroups.data_structures.fp_tree_for_sdmapstar module

This file contains the implementation of the FPTree data structure used in the SDMapStar algorithm.

class subgroups.data_structures.fp_tree_for_sdmapstar.FPTreeForSDMapStar(TP, FP)[source]

Bases: FPTreeForSDMap

This class represents the FPTree data structure used in the SDMapStar algorithm.

generate_conditional_fp_tree_star(list_of_selectors, min_optimistic_estimate, optimistic_estimate, additional_parameters={}, minimum_tp=None, minimum_fp=None, minimum_n=None)[source]

Method to get the conditional FPTree with a list of selectors. Two threshold types could be used: (1) the true positives tp and the false positives fp separately or (2) the subgroup description size n (n = tp + fp). This means that: (1) if ‘minimum_tp’ and ‘minimum_fp’ have a value of type ‘int’, ‘minimum_n’ must be None; and (2) if ‘minimum_n’ has a value of type ‘int’, ‘minimum_tp’ and ‘minimum_fp’ must be None.

Parameters:

list_of_selectors (list[subgroups.core.selector.Selector]) – the list of selectors which is used. IMPORTANT: we assume that the list of selectors only contains selectors.
min_optimistic_estimate (int) – the minimum optimistic estimate threshold.
optimistic_estimate (subgroups.quality_measures.quality_measure.QualityMeasure) – the optimistic estimate quality measure.
additional_parameters (dict) – the additional parameters for the optimistic estimate quality measure.
minimum_tp (typing.Optional[int]) – the minimum true positives (tp) threshold.
minimum_fp (typing.Optional[int]) – the minimum false positives (fp) threshold.
minimum_n (typing.Optional[int]) – the minimum subgroup description size (n) threshold.

Return type:

tuple[subgroups.data_structures.fp_tree_for_sdmapstar.FPTreeForSDMapStar, int]

Returns:

the generated conditional FPTree and the number of pruned branches.

subgroups.data_structures.fp_tree_node module

This file contains the implementation of a generic FPTree Node.

class subgroups.data_structures.fp_tree_node.FPTreeNode(selector, counters, node_link)[source]

Bases: object

This class represents a generic FPTree Node.

Parameters:

selector (subgroups.core.selector.Selector) – the Selector which is represented by this node.
counters (list[int]) – a list with the needed counters (the meaning of its elements depends on the situation). IMPORTANT: we assume that this list only contains values of type ‘int’.
node_link (typing.Optional[subgroups.data_structures.fp_tree_node.FPTreeNode]) – the next node in the FPTree with the same selector as this one (or None if it does not exist).

add_child(child_node)[source]

Method to add a child node to the current node. The current node will be the parent of the added child node. IMPORTANT: if there is already a child node with the same selector, a DuplicateFpTreeNodeError exception is raised.

Parameters:: child_node (subgroups.data_structures.fp_tree_node.FPTreeNode) – the child node which is added.
Return type:: None

property counters: list[int]: A list with the needed counters (the meaning of its elements depends on the situation). IMPORTANT: we assume that this list only contains values of type ‘int’.

delete_child_by_selector(selector)[source]

Method to delete a child node from the current node by selector. The current node will not be the parent of the deleted child node anymore. IMPORTANT: if there is no child node with the selector, a KeyError exception is raised.

Parameters:: selector (subgroups.core.selector.Selector) – the selector which is used in order to delete the child node.
Return type:: None

get_child_by_selector(selector)[source]

Method to get the child whose selector is passed by parameter. IMPORTANT: if there is no child node with that selector, this method return None.

Parameters:: selector (subgroups.core.selector.Selector) – the selector which is checked.
Return type:: typing.Optional[subgroups.data_structures.fp_tree_node.FPTreeNode]
Returns:: the child whose selector is passed by parameter or None if it does not exist.

has_this_child(node)[source]

Method to check whether the node passed by parameter is a child of this one.

Parameters:: node (subgroups.data_structures.fp_tree_node.FPTreeNode) – the node which is checked.
Return type:: bool
Returns:: whether the node passed by parameter is a child of this one.

is_child_of(node)[source]

Method to check whether the node passed by parameter is the parent of this one or to check whether it does not exist parent (passing None by parameter).

Parameters:: node (typing.Optional[subgroups.data_structures.fp_tree_node.FPTreeNode]) – the node which is checked or None.
Return type:: bool
Returns:: whether the node passed by parameter is the parent of this one or whether it does not exist parent (if None was passed by parameter).

property node_link: FPTreeNode | None: The next node in the FPTree with the same selector as this one (or None if it does not exist).

property number_of_children: int: The number of children of this node.

property parent: FPTreeNode | None: The parent of this node

property selector: Selector: The Selector which is represented by this node.

tree_as_str(current_depth=0)[source]

Method to print as str the current node and the complete subtree from the current node.

Parameters:: current_depth (int) – the depth of the current node. By default, 0.
Return type:: str
Returns:: the printed result (the current node and the complete subtree from the current node).

subgroups.data_structures.subgroup_list module

This file contains the implementation of the Subgroup List data structure.

class subgroups.data_structures.subgroup_list.SubgroupList(dataset_target_bitarray_of_positives, dataset_target_bitarray_of_negatives, number_of_dataset_instances)[source]

Bases: object

This class represents a Subgroup List.

Parameters:

dataset_target_bitarray_of_positives (bitarray.bitarray) – the bitarray of the dataset instances which are covered by the target (its length must be equal to the number of instances of the dataset).
dataset_target_bitarray_of_negatives (bitarray.bitarray) – the bitarray of the dataset instances which are not covered by the target (its length must be equal to the number of instances of the dataset).
number_of_dataset_instances (int) – number of instances of the dataset.

add_subgroup(subgroup, bitarray_of_positives, bitarray_of_negatives)[source]

Method to add an individual subgroup at the end of the subgroup list (and before the default rule).

Parameters:

subgroup (subgroups.core.subgroup.Subgroup) – subgroup which is added.
bitarray_of_positives (bitarray.bitarray) – the bitarray of the dataset instances (considering the complete dataset) which are covered by the subgroup description and by the subgroup target.
bitarray_of_negatives (bitarray.bitarray) – the bitarray of the dataset instances (considering the complete dataset) which are covered by the subgroup description, but not by the subgroup target.

Return type:

None

property dataset_number_of_negatives: int: Number of instances (considering the complete dataset) which are not covered by the target.

property dataset_number_of_positives: int: Number of instances (considering the complete dataset) which are covered by the target.

property dataset_target_distribution: float: Target distribution considering the complete dataset.

property default_rule_bitarray_of_negatives: bitarray: The bitarray of the dataset instances which are not covered by the target.

property default_rule_bitarray_of_positives: bitarray: The bitarray of the dataset instances which are covered by the target.

delete_last_subgroup()[source]

Method to delete the last individual subgroup from the subgroup list. If the subgroup list is empty, this method does nothing.

Return type:: None

get_subgroup(index)[source]

Return type:: subgroups.core.subgroup.Subgroup

get_subgroup_bitarray_of_negatives(index)[source]

Get the bitarray of negatives of the subgroup in the position ‘index’. This bitarray depends on the position of the subgroup in the list (i.e., it DOES NOT consider the complete dataset).

Return type:: bitarray.bitarray

get_subgroup_bitarray_of_positives(index)[source]

Get the bitarray of positives of the subgroup in the position ‘index’. This bitarray depends on the position of the subgroup in the list (i.e., it DOES NOT consider the complete dataset).

Return type:: bitarray.bitarray

get_subgroup_original_bitarray_of_negatives(index)[source]

Get the original bitarray of negatives of the subgroup in the position ‘index’. This bitarray considers the subgroup individually (i.e., with respect to the complete dataset).

Return type:: bitarray.bitarray

get_subgroup_original_bitarray_of_positives(index)[source]

Get the original bitarray of positives of the subgroup in the position ‘index’. This bitarray considers the subgroup individually (i.e., with respect to the complete dataset).

Return type:: bitarray.bitarray

is_empty()[source]

Return type:: bool

property number_of_dataset_instances: int: Number of instances of the dataset.

subgroups.data_structures.vertical_list module

This file contains the implementation of the root class of all the implemented Vertical Lists (data structure used by the VLSD algorithm). Conceptually, a Vertical List is similar to a Subgroup. This class is an abstract class and cannot be instantiated.

class subgroups.data_structures.vertical_list.VerticalList(list_of_selectors, sequence_of_instances_tp, sequence_of_instances_fp, number_of_dataset_instances, quality_value)[source]

Bases: ABC

This abstract class defines the root class of all the implemented Vertical Lists (data structure used by the VLSD algorithm). Conceptually, a Vertical List is similar to a Subgroup.

abstract compute_quality_value(quality_measure, dict_of_parameters)[source]

Method to compute the Vertical List quality value using the dictionary of parameters passed by parameter. This method uses the parameters ‘tp’ and ‘fp’ of the Vertical List, not of the dictionary of parameters passed by parameter. IMPORTANT: this method does not modify the Vertical List.

Parameters:

quality_measure (subgroups.quality_measures.quality_measure.QualityMeasure) – the quality measure which is used.
dict_of_parameters (dict[str, typing.Union[int, float]]) – python dictionary which contains all the needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the Vertical List, not of the dictionary of parameters passed by parameter.

Return type:

float

Returns:

the computed value for the Vertical List quality value.

abstract property fp: int: The number of dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.

abstract join(other_vertical_list, quality_measure, dict_of_parameters, return_None_if_n_is_0=False)[source]

Method to create a new Vertical List as a result of the join of two Vertical Lists. The join of two Vertical Lists implies the following: (1) the last selector of the list of selectors of the second Vertical List is added to the end of the list of selectors of the first Vertical List, and (2) the new sequences of IDs (both) are the intersection of the corresponding original ones.

Parameters:

other_vertical_list (subgroups.data_structures.vertical_list.VerticalList) – the Vertical List with which to make the join.
quality_measure (subgroups.quality_measures.quality_measure.QualityMeasure) – the quality measure which is used to compute the quality value of the created Vertical List.
dict_of_parameters (dict[str, typing.Union[int, float]]) – python dictionary which contains all the needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the created Vertical List, not of the dictionary of parameters passed by parameter.
return_None_if_n_is_0 (bool) – if the subgroup parameter n (i.e., tp + fp) of the resulting Vertical List (i.e., the join) is 0, this means that both sequence of instances are empty and, therefore, this means that the pattern represented by the Vertical List is not in any instance in the dataset. If the parameter ‘return_None_if_n_is_0’ is True, None will be returned instead of a Vertical List object. By default, this parameter is False.

Return type:

typing.Optional[subgroups.data_structures.vertical_list.VerticalList]

Returns:

a new Vertical List as a result of the join of this Vertical List (self) and ‘other_vertical_list’.

property list_of_selectors: list[Selector]: The list of selectors represented by the Vertical List.

abstract property n: int: The number of dataset instances which are covered by the selectors (‘list_of_selectors’), no matter the target.

property number_of_dataset_instances: int: Number of instances of the dataset from which this Vertical List has been generated.

property quality_value: int | float: The Vertical List quality value.

abstract property sequence_of_instances_fp: Collection[int]: The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.

abstract property sequence_of_instances_tp: Collection[int]: The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.

abstract property tp: int: The number of dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.

subgroups.data_structures.vertical_list_with_bitsets module

This file contains the implementation of a Vertical List data structure whose sequences are implemented using bitsets.

class subgroups.data_structures.vertical_list_with_bitsets.VerticalListWithBitsets(list_of_selectors, sequence_of_instances_tp, sequence_of_instances_fp, number_of_dataset_instances, quality_value)[source]

Bases: VerticalList

This class represents a Vertical List data structure whose sequences are implemented using bitsets.

Parameters:

list_of_selectors (list[subgroups.core.selector.Selector]) – the list of selectors represented by the Vertical List.
sequence_of_instances_tp (collections.abc.Collection[int]) – the sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target. The number of elements in this sequence would be the true positives tp of the equivalent subgroup with the same list of selectors and with the same target.
sequence_of_instances_fp (collections.abc.Collection[int]) – the sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target. The number of elements in this sequence would be the false positives fp of the equivalent subgroup with the same list of selectors and with the same target.
number_of_dataset_instances (int) – number of instances of the dataset.
quality_value (typing.Union[int, float]) – the Vertical List quality value.

compute_quality_value(quality_measure, dict_of_parameters)[source]

Method to compute the Vertical List quality value using the dictionary of parameters passed by parameter. This method uses the parameters ‘tp’ and ‘fp’ of the Vertical List, not of the dictionary of parameters passed by parameter. IMPORTANT: this method does not modify the Vertical List.

Parameters:

quality_measure (subgroups.quality_measures.quality_measure.QualityMeasure) – the quality measure which is used.
dict_of_parameters (dict[str, typing.Union[int, float]]) – python dictionary which contains all needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the Vertical List, not of the dictionary of parameters passed by parameter.

Return type:

float

Returns:

the computed value for the Vertical List quality value.

property fp: int: The number of dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.

join(other_vertical_list, quality_measure, dict_of_parameters, return_None_if_n_is_0=False)[source]

Method to create a new Vertical List as a result of the join of two Vertical Lists. The join of two Vertical Lists implies the following: (1) the last selector of the list of selectors of the second Vertical List is added to the end of the list of selectors of the first Vertical List, and (2) the new sequences of IDs (both) are the intersection of the corresponding original ones.

Parameters:

other_vertical_list (subgroups.data_structures.vertical_list_with_bitsets.VerticalListWithBitsets) – the Vertical List with which to make the join.
quality_measure (subgroups.quality_measures.quality_measure.QualityMeasure) – the quality measure which is used to compute the quality value of the created Vertical List.
dict_of_parameters (dict[str, typing.Union[int, float]]) – python dictionary which contains all needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the created Vertical List, not of the dictionary of parameters passed by parameter.
return_None_if_n_is_0 (bool) – if the subgroup parameter n (i.e., tp + fp) of the resulting Vertical List (i.e., the join) is 0, this means that both sequence of instances are empty and, therefore, this means that the pattern represented by the Vertical List is not in any instance in the dataset. If the parameter ‘return_None_if_n_is_0’ is True, None will be returned instead of a Vertical List object. By default, this parameter is False.

Return type:

typing.Optional[subgroups.data_structures.vertical_list_with_bitsets.VerticalListWithBitsets]

Returns:

a new Vertical List as a result of the join of this Vertical List (self) and ‘other_vertical_list’.

property n: int: The number of dataset instances which are covered by the selectors (‘list_of_selectors’), no matter the target.

property sequence_of_instances_fp: bitarray: The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.

property sequence_of_instances_tp: bitarray: The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.

property tp: int: The number of dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.

subgroups.data_structures.vertical_list_with_sets module

This file contains the implementation of a Vertical List data structure whose sequences are implemented using python sets.

class subgroups.data_structures.vertical_list_with_sets.VerticalListWithSets(list_of_selectors, sequence_of_instances_tp, sequence_of_instances_fp, number_of_dataset_instances, quality_value)[source]

Bases: VerticalList

This class represents a Vertical List data structure whose sequences are implemented using python sets.

Parameters:

list_of_selectors (list[subgroups.core.selector.Selector]) – the list of selectors represented by the Vertical List.
sequence_of_instances_tp (collections.abc.Collection[int]) – the sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target. The number of elements in this sequence would be the true positives tp of the equivalent subgroup with the same list of selectors and with the same target.
sequence_of_instances_fp (collections.abc.Collection[int]) – the sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target. The number of elements in this sequence would be the false positives fp of the equivalent subgroup with the same list of selectors and with the same target.
number_of_dataset_instances (int) – number of instances of the dataset.
quality_value (typing.Union[int, float]) – the Vertical List quality value.

compute_quality_value(quality_measure, dict_of_parameters)[source]

Method to compute the Vertical List quality value using the dictionary of parameters passed by parameter. This method uses the parameters ‘tp’ and ‘fp’ of the Vertical List, not of the dictionary of parameters passed by parameter. IMPORTANT: this method does not modify the Vertical List.

Parameters:

quality_measure (subgroups.quality_measures.quality_measure.QualityMeasure) – the quality measure which is used.
dict_of_parameters (dict[str, typing.Union[int, float]]) – python dictionary which contains all needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the Vertical List, not of the dictionary of parameters passed by parameter.

Return type:

float

Returns:

the computed value for the Vertical List quality value.

property fp: int: The number of dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.

join(other_vertical_list, quality_measure, dict_of_parameters, return_None_if_n_is_0=False)[source]

Method to create a new Vertical List as a result of the join of two Vertical Lists. The join of two Vertical Lists implies the following: (1) the last selector of the list of selectors of the second Vertical List is added to the end of the list of selectors of the first Vertical List, and (2) the new sequences of IDs (both) are the intersection of the corresponding original ones.

Parameters:

other_vertical_list (subgroups.data_structures.vertical_list_with_sets.VerticalListWithSets) – the Vertical List with which to make the join.
quality_measure (subgroups.quality_measures.quality_measure.QualityMeasure) – the quality measure which is used to compute the quality value of the created Vertical List.
dict_of_parameters (dict[str, typing.Union[int, float]]) – python dictionary which contains all needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the created Vertical List, not of the dictionary of parameters passed by parameter.
return_None_if_n_is_0 (bool) – if the subgroup parameter n (i.e., tp + fp) of the resulting Vertical List (i.e., the join) is 0, this means that both sequence of instances are empty and, therefore, this means that the pattern represented by the Vertical List is not in any instance in the dataset. If the parameter ‘return_None_if_n_is_0’ is True, None will be returned instead of a Vertical List object. By default, this parameter is False.

Return type:

typing.Optional[subgroups.data_structures.vertical_list_with_sets.VerticalListWithSets]

Returns:

a new Vertical List as a result of the join of this Vertical List (self) and ‘other_vertical_list’.

property n: int: The number of dataset instances which are covered by the selectors (‘list_of_selectors’), no matter the target.

property sequence_of_instances_fp: set[int]: The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.

property sequence_of_instances_tp: set[int]: The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.

property tp: int: The number of dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.