subgroups.data_structures package
Submodules
subgroups.data_structures.bitset_bsd module
This file contains the implementation of the Bitset data structure used in the BSD algorithm and its variants.
- class subgroups.data_structures.bitset_bsd.BitsetBSD[source]
Bases:
object
This class represents a bitset used in the BSD algorithm and its variants.
- property bitset_neg: dict
The bitset dictionary for rows that do not match the target value.
- property bitset_pos: dict
The bitset dictionary for rows that match the target value.
- build_bitset(pandas_dataframe, set_of_frequent_selectors, tuple_target_attribute_value)[source]
Method to build the complete tree from the root node using a set of frequent selectors.
- Parameters:
pandas_dataframe (
pandas.core.frame.DataFrame
) – Input dataset. It is VERY IMPORTANT to respect the following conditions: (1) the dataset must be a pandas dataframe, (2) the dataset must not contain missing values, (3) for each attribute, all its values must be of the same type.set_of_frequent_selectors (
list
) – The set of frequent selectors (L) to use in the building of the tree.tuple_target_attribute_value (
tuple
) – Tuple with the name of the target attribute (first element) and with the value of this attribute (second element). EXAMPLE1: (“age”, 25). EXAMPLE2: (“class”, “Setosa”). It is VERY IMPORTANT to respect the following conditions: (1) the name of the target attribute MUST be a string, (2) the name of the target attribute MUST exist in the dataset, (3) it is VERY IMPORTANT to respect the types of the attributes: the value in the tuple (second element) MUST BE comparable with the values of the corresponding attribute in the dataset, (4) the value of the target attribute MUST exist in the dataset.
- Return type:
None
- generate_set_of_frequent_selectors(pandas_dataframe, tuple_target_attribute_value, min_support)[source]
Method to scan the dataset (ONLY DISCRETE/NOMINAL ATTRIBUTES) and collect the sorted set of frequent selectors (L).
- Parameters:
pandas_dataframe (pandas.DataFrame) – Input dataset. It is VERY IMPORTANT to respect the following conditions: (1) the dataset must be a pandas dataframe, (2) the dataset must not contain missing values, (3) for each attribute, all its values must be of the same type.
tuple_target_attribute_value (tuple) – Tuple with the name of the target attribute (first element) and with the value of this attribute (second element). EXAMPLE1: (“age”, 25). EXAMPLE2: (“class”, “Setosa”). It is VERY IMPORTANT to respect the following conditions: (1) the name of the target attribute MUST be a string, (2) the name of the target attribute MUST exist in the dataset, (3) it is VERY IMPORTANT to respect the types of the attributes: the value in the tuple (second element) MUST BE comparable with the values of the corresponding attribute in the dataset, (4) the value of the target attribute MUST exist in the dataset.
min_support (int) – Minimum support threshold (NUMBER OF TIMES, NOT A PROPORTION).
- Return type:
list
- Returns:
the sorted set of frequent selectors (L) as a list.
- class subgroups.data_structures.bitset_bsd.BitsetDictionary[source]
Bases:
dict
Internal class to implement the dicttionaries used in the bitset. This dictionary only allows to insert a Pattern or a Selector as key. If a Selector is inserted, it is converted to a Pattern. Each entry must store a bitarray.
subgroups.data_structures.bitset_qfinder module
This file contains the implementation of the Bitset data structure used in the QFinder to create the regression models.
- class subgroups.data_structures.bitset_qfinder.Bitset_QFinder[source]
Bases:
object
This class represents a bitset used in the QFinder algorithm.
- compute_credibility_measures(target_column)[source]
Method to compute the credibility measures for each candidate pattern.
- Parameters:
target_column – target column of the dataset.
- Return type:
pandas.core.frame.DataFrame
- Returns:
a pandas DataFrame with the credibility values for each candidate pattern.
- generate_bitset(df, tuple_target_attribute_value, list_of_candidate_patterns)[source]
This method generates a bitset from a dataset and a list of candidate patterns. Each column of the bitset represents a candidate pattern and each row represents an instance of the dataset. The value of each cell is True if the corresponding pattern appears in the corresponding instance and False otherwise.
- Parameters:
df (
pandas.core.frame.DataFrame
) – dataset from which the bitset is generated.tuple_target_attribute_value (
tuple
) – tuple which contains the name of the target attribute and its value.list_of_candidate_patterns (
list
[subgroups.core.pattern.Pattern
]) – list of candidate patterns.
- Return type:
None
subgroups.data_structures.fp_tree_for_sdmap module
This file contains the implementation of the FPTree data structure used in the SDMap algorithm.
- class subgroups.data_structures.fp_tree_for_sdmap.FPTreeForSDMap[source]
Bases:
object
This class represents the FPTree data structure used in the SDMap algorithm.
- build_tree(pandas_dataframe, set_of_frequent_selectors, target)[source]
Method to build the complete FPTree from a pandas DataFrame and using the set of frequent selectors. IMPORTANT: missing values are not supported yet.
- Parameters:
pandas_dataframe (
pandas.core.frame.DataFrame
) – the DataFrame which is scanned. IMPORTANT: missing values are not supported yet.set_of_frequent_selectors (
dict
[str
,tuple
[subgroups.core.selector.Selector
,list
[int
],int
]]) – the set of frequent selectors generated by the method ‘generate_set_of_frequent_selectors’.target (
tuple
[str
,typing.Union
[int
,float
,str
]]) – a tuple with 2 elements: the target attribute name and the target value.
- Return type:
None
- generate_conditional_fp_tree(list_of_selectors, minimum_tp=None, minimum_fp=None, minimum_n=None)[source]
Method to get the conditional FPTree with a list of selectors. Two threshold types could be used: (1) the true positives tp and the false positives fp separately or (2) the subgroup description size n (n = tp + fp). This means that: (1) if ‘minimum_tp’ and ‘minimum_fp’ have a value of type ‘int’, ‘minimum_n’ must be None; and (2) if ‘minimum_n’ has a value of type ‘int’, ‘minimum_tp’ and ‘minimum_fp’ must be None.
- Parameters:
list_of_selectors (
list
[subgroups.core.selector.Selector
]) – the list of selectors which is used. IMPORTANT: we assume that the list of selectors only contains selectors.minimum_tp (
typing.Optional
[int
]) – the minimum true positives (tp) threshold.minimum_fp (
typing.Optional
[int
]) – the minimum false positives (fp) threshold.minimum_n (
typing.Optional
[int
]) – the minimum subgroup description size (n) threshold.
- Return type:
- Returns:
the generated conditional FPTree.
- generate_set_of_frequent_selectors(pandas_dataframe, target, minimum_tp=None, minimum_fp=None, minimum_n=None)[source]
Method to scan the pandas DataFrame in order to generate the set of frequent selectors. Two threshold types could be used: (1) the true positives tp and the false positives fp separately or (2) the subgroup description size n (n = tp + fp). This means that: (1) if ‘minimum_tp’ and ‘minimum_fp’ have a value of type ‘int’, ‘minimum_n’ must be None; and (2) if ‘minimum_n’ has a value of type ‘int’, ‘minimum_tp’ and ‘minimum_fp’ must be None. IMPORTANT: missing values are not supported yet.
- Parameters:
pandas_dataframe (
pandas.core.frame.DataFrame
) – the DataFrame which is scanned. IMPORTANT: missing values are not supported yet.target (
tuple
[str
,typing.Union
[int
,float
,str
]]) – a tuple with 2 elements: the target attribute name and the target value.minimum_tp (
typing.Optional
[int
]) – the minimum true positives (tp) threshold.minimum_fp (
typing.Optional
[int
]) – the minimum false positives (fp) threshold.minimum_n (
typing.Optional
[int
]) – the minimum subgroup description size (n) threshold.
- Return type:
dict
[str
,tuple
[subgroups.core.selector.Selector
,list
[int
],int
]]- Returns:
a dictionary in which the keys are strings (the concatenation of the selector attribute name and the selector value) and the values are tuples with 3 elements: (1) the selector, (2) a list with 2 elements: the true positives tp of it and the false positives fp of it, and (3) a number indicating the insertion order in this dictionary (starting from 0).
- header_table_as_str(follow_node_links=True)[source]
Method to print all the entries of the FPTree header table.
- Parameters:
follow_node_links (
bool
) – whether print all the FPTreeNode ids in the horizontal list or only the first one. By default, True.- Return type:
str
- Returns:
the printed header table.
- is_empty()[source]
Method to check whether the FPTree only has the root node.
- Return type:
bool
- Returns:
whether the FPTree only has the root node.
- property root_node: FPTreeNode
The root of the tree.
- property sorted_header_table: list
A list with the selectors of the header table sorted according to the summation of the ‘n’ (summation of the true positives tp + summation of the false positives fp).
subgroups.data_structures.fp_tree_for_sdmapstar module
This file contains the implementation of the FPTree data structure used in the SDMapStar algorithm.
- class subgroups.data_structures.fp_tree_for_sdmapstar.FPTreeForSDMapStar(TP, FP)[source]
Bases:
FPTreeForSDMap
This class represents the FPTree data structure used in the SDMapStar algorithm.
- generate_conditional_fp_tree_star(list_of_selectors, min_optimistic_estimate, optimistic_estimate, additional_parameters={}, minimum_tp=None, minimum_fp=None, minimum_n=None)[source]
Method to get the conditional FPTree with a list of selectors. Two threshold types could be used: (1) the true positives tp and the false positives fp separately or (2) the subgroup description size n (n = tp + fp). This means that: (1) if ‘minimum_tp’ and ‘minimum_fp’ have a value of type ‘int’, ‘minimum_n’ must be None; and (2) if ‘minimum_n’ has a value of type ‘int’, ‘minimum_tp’ and ‘minimum_fp’ must be None.
- Parameters:
list_of_selectors (
list
[subgroups.core.selector.Selector
]) – the list of selectors which is used. IMPORTANT: we assume that the list of selectors only contains selectors.min_optimistic_estimate (
int
) – the minimum optimistic estimate threshold.optimistic_estimate (
subgroups.quality_measures.quality_measure.QualityMeasure
) – the optimistic estimate quality measure.additional_parameters (
dict
) – the additional parameters for the optimistic estimate quality measure.minimum_tp (
typing.Optional
[int
]) – the minimum true positives (tp) threshold.minimum_fp (
typing.Optional
[int
]) – the minimum false positives (fp) threshold.minimum_n (
typing.Optional
[int
]) – the minimum subgroup description size (n) threshold.
- Return type:
tuple
[subgroups.data_structures.fp_tree_for_sdmapstar.FPTreeForSDMapStar
,int
]- Returns:
the generated conditional FPTree and the number of pruned branches.
subgroups.data_structures.fp_tree_node module
This file contains the implementation of a generic FPTree Node.
- class subgroups.data_structures.fp_tree_node.FPTreeNode(selector, counters, node_link)[source]
Bases:
object
This class represents a generic FPTree Node.
- Parameters:
selector (
subgroups.core.selector.Selector
) – the Selector which is represented by this node.counters (
list
[int
]) – a list with the needed counters (the meaning of its elements depends on the situation). IMPORTANT: we assume that this list only contains values of type ‘int’.node_link (
typing.Optional
[subgroups.data_structures.fp_tree_node.FPTreeNode
]) – the next node in the FPTree with the same selector as this one (or None if it does not exist).
- add_child(child_node)[source]
Method to add a child node to the current node. The current node will be the parent of the added child node. IMPORTANT: if there is already a child node with the same selector, a DuplicateFpTreeNodeError exception is raised.
- Parameters:
child_node (
subgroups.data_structures.fp_tree_node.FPTreeNode
) – the child node which is added.- Return type:
None
- property counters: list[int]
A list with the needed counters (the meaning of its elements depends on the situation). IMPORTANT: we assume that this list only contains values of type ‘int’.
- delete_child_by_selector(selector)[source]
Method to delete a child node from the current node by selector. The current node will not be the parent of the deleted child node anymore. IMPORTANT: if there is no child node with the selector, a KeyError exception is raised.
- Parameters:
selector (
subgroups.core.selector.Selector
) – the selector which is used in order to delete the child node.- Return type:
None
- get_child_by_selector(selector)[source]
Method to get the child whose selector is passed by parameter. IMPORTANT: if there is no child node with that selector, this method return None.
- Parameters:
selector (
subgroups.core.selector.Selector
) – the selector which is checked.- Return type:
typing.Optional
[subgroups.data_structures.fp_tree_node.FPTreeNode
]- Returns:
the child whose selector is passed by parameter or None if it does not exist.
- has_this_child(node)[source]
Method to check whether the node passed by parameter is a child of this one.
- Parameters:
node (
subgroups.data_structures.fp_tree_node.FPTreeNode
) – the node which is checked.- Return type:
bool
- Returns:
whether the node passed by parameter is a child of this one.
- is_child_of(node)[source]
Method to check whether the node passed by parameter is the parent of this one or to check whether it does not exist parent (passing None by parameter).
- Parameters:
node (
typing.Optional
[subgroups.data_structures.fp_tree_node.FPTreeNode
]) – the node which is checked or None.- Return type:
bool
- Returns:
whether the node passed by parameter is the parent of this one or whether it does not exist parent (if None was passed by parameter).
- property node_link: FPTreeNode | None
The next node in the FPTree with the same selector as this one (or None if it does not exist).
- property number_of_children: int
The number of children of this node.
- property parent: FPTreeNode | None
The parent of this node
- tree_as_str(current_depth=0)[source]
Method to print as str the current node and the complete subtree from the current node.
- Parameters:
current_depth (
int
) – the depth of the current node. By default, 0.- Return type:
str
- Returns:
the printed result (the current node and the complete subtree from the current node).
subgroups.data_structures.subgroup_list module
This file contains the implementation of the Subgroup List data structure.
- class subgroups.data_structures.subgroup_list.SubgroupList(dataset_target_bitarray_of_positives, dataset_target_bitarray_of_negatives, number_of_dataset_instances)[source]
Bases:
object
This class represents a Subgroup List.
- Parameters:
dataset_target_bitarray_of_positives (
bitarray.bitarray
) – the bitarray of the dataset instances which are covered by the target (its length must be equal to the number of instances of the dataset).dataset_target_bitarray_of_negatives (
bitarray.bitarray
) – the bitarray of the dataset instances which are not covered by the target (its length must be equal to the number of instances of the dataset).number_of_dataset_instances (
int
) – number of instances of the dataset.
- add_subgroup(subgroup, bitarray_of_positives, bitarray_of_negatives)[source]
Method to add an individual subgroup at the end of the subgroup list (and before the default rule).
- Parameters:
subgroup (
subgroups.core.subgroup.Subgroup
) – subgroup which is added.bitarray_of_positives (
bitarray.bitarray
) – the bitarray of the dataset instances (considering the complete dataset) which are covered by the subgroup description and by the subgroup target.bitarray_of_negatives (
bitarray.bitarray
) – the bitarray of the dataset instances (considering the complete dataset) which are covered by the subgroup description, but not by the subgroup target.
- Return type:
None
- property dataset_number_of_negatives: int
Number of instances (considering the complete dataset) which are not covered by the target.
- property dataset_number_of_positives: int
Number of instances (considering the complete dataset) which are covered by the target.
- property dataset_target_distribution: float
Target distribution considering the complete dataset.
- property default_rule_bitarray_of_negatives: bitarray
The bitarray of the dataset instances which are not covered by the target.
- property default_rule_bitarray_of_positives: bitarray
The bitarray of the dataset instances which are covered by the target.
- delete_last_subgroup()[source]
Method to delete the last individual subgroup from the subgroup list. If the subgroup list is empty, this method does nothing.
- Return type:
None
- get_subgroup_bitarray_of_negatives(index)[source]
Get the bitarray of negatives of the subgroup in the position ‘index’. This bitarray depends on the position of the subgroup in the list (i.e., it DOES NOT consider the complete dataset).
- Return type:
bitarray.bitarray
- get_subgroup_bitarray_of_positives(index)[source]
Get the bitarray of positives of the subgroup in the position ‘index’. This bitarray depends on the position of the subgroup in the list (i.e., it DOES NOT consider the complete dataset).
- Return type:
bitarray.bitarray
- get_subgroup_original_bitarray_of_negatives(index)[source]
Get the original bitarray of negatives of the subgroup in the position ‘index’. This bitarray considers the subgroup individually (i.e., with respect to the complete dataset).
- Return type:
bitarray.bitarray
- get_subgroup_original_bitarray_of_positives(index)[source]
Get the original bitarray of positives of the subgroup in the position ‘index’. This bitarray considers the subgroup individually (i.e., with respect to the complete dataset).
- Return type:
bitarray.bitarray
- property number_of_dataset_instances: int
Number of instances of the dataset.
subgroups.data_structures.vertical_list module
This file contains the implementation of the root class of all the implemented Vertical Lists (data structure used by the VLSD algorithm). Conceptually, a Vertical List is similar to a Subgroup. This class is an abstract class and cannot be instantiated.
- class subgroups.data_structures.vertical_list.VerticalList(list_of_selectors, sequence_of_instances_tp, sequence_of_instances_fp, number_of_dataset_instances, quality_value)[source]
Bases:
ABC
This abstract class defines the root class of all the implemented Vertical Lists (data structure used by the VLSD algorithm). Conceptually, a Vertical List is similar to a Subgroup.
- abstract compute_quality_value(quality_measure, dict_of_parameters)[source]
Method to compute the Vertical List quality value using the dictionary of parameters passed by parameter. This method uses the parameters ‘tp’ and ‘fp’ of the Vertical List, not of the dictionary of parameters passed by parameter. IMPORTANT: this method does not modify the Vertical List.
- Parameters:
quality_measure (
subgroups.quality_measures.quality_measure.QualityMeasure
) – the quality measure which is used.dict_of_parameters (
dict
[str
,typing.Union
[int
,float
]]) – python dictionary which contains all the needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the Vertical List, not of the dictionary of parameters passed by parameter.
- Return type:
float
- Returns:
the computed value for the Vertical List quality value.
- abstract property fp: int
The number of dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.
- abstract join(other_vertical_list, quality_measure, dict_of_parameters, return_None_if_n_is_0=False)[source]
Method to create a new Vertical List as a result of the join of two Vertical Lists. The join of two Vertical Lists implies the following: (1) the last selector of the list of selectors of the second Vertical List is added to the end of the list of selectors of the first Vertical List, and (2) the new sequences of IDs (both) are the intersection of the corresponding original ones.
- Parameters:
other_vertical_list (
subgroups.data_structures.vertical_list.VerticalList
) – the Vertical List with which to make the join.quality_measure (
subgroups.quality_measures.quality_measure.QualityMeasure
) – the quality measure which is used to compute the quality value of the created Vertical List.dict_of_parameters (
dict
[str
,typing.Union
[int
,float
]]) – python dictionary which contains all the needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the created Vertical List, not of the dictionary of parameters passed by parameter.return_None_if_n_is_0 (
bool
) – if the subgroup parameter n (i.e., tp + fp) of the resulting Vertical List (i.e., the join) is 0, this means that both sequence of instances are empty and, therefore, this means that the pattern represented by the Vertical List is not in any instance in the dataset. If the parameter ‘return_None_if_n_is_0’ is True, None will be returned instead of a Vertical List object. By default, this parameter is False.
- Return type:
typing.Optional
[subgroups.data_structures.vertical_list.VerticalList
]- Returns:
a new Vertical List as a result of the join of this Vertical List (self) and ‘other_vertical_list’.
- abstract property n: int
The number of dataset instances which are covered by the selectors (‘list_of_selectors’), no matter the target.
- property number_of_dataset_instances: int
Number of instances of the dataset from which this Vertical List has been generated.
- property quality_value: int | float
The Vertical List quality value.
- abstract property sequence_of_instances_fp: Collection[int]
The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.
- abstract property sequence_of_instances_tp: Collection[int]
The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.
- abstract property tp: int
The number of dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.
subgroups.data_structures.vertical_list_with_bitsets module
This file contains the implementation of a Vertical List data structure whose sequences are implemented using bitsets.
- class subgroups.data_structures.vertical_list_with_bitsets.VerticalListWithBitsets(list_of_selectors, sequence_of_instances_tp, sequence_of_instances_fp, number_of_dataset_instances, quality_value)[source]
Bases:
VerticalList
This class represents a Vertical List data structure whose sequences are implemented using bitsets.
- Parameters:
list_of_selectors (
list
[subgroups.core.selector.Selector
]) – the list of selectors represented by the Vertical List.sequence_of_instances_tp (
collections.abc.Collection
[int
]) – the sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target. The number of elements in this sequence would be the true positives tp of the equivalent subgroup with the same list of selectors and with the same target.sequence_of_instances_fp (
collections.abc.Collection
[int
]) – the sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target. The number of elements in this sequence would be the false positives fp of the equivalent subgroup with the same list of selectors and with the same target.number_of_dataset_instances (
int
) – number of instances of the dataset.quality_value (
typing.Union
[int
,float
]) – the Vertical List quality value.
- compute_quality_value(quality_measure, dict_of_parameters)[source]
Method to compute the Vertical List quality value using the dictionary of parameters passed by parameter. This method uses the parameters ‘tp’ and ‘fp’ of the Vertical List, not of the dictionary of parameters passed by parameter. IMPORTANT: this method does not modify the Vertical List.
- Parameters:
quality_measure (
subgroups.quality_measures.quality_measure.QualityMeasure
) – the quality measure which is used.dict_of_parameters (
dict
[str
,typing.Union
[int
,float
]]) – python dictionary which contains all needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the Vertical List, not of the dictionary of parameters passed by parameter.
- Return type:
float
- Returns:
the computed value for the Vertical List quality value.
- property fp: int
The number of dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.
- join(other_vertical_list, quality_measure, dict_of_parameters, return_None_if_n_is_0=False)[source]
Method to create a new Vertical List as a result of the join of two Vertical Lists. The join of two Vertical Lists implies the following: (1) the last selector of the list of selectors of the second Vertical List is added to the end of the list of selectors of the first Vertical List, and (2) the new sequences of IDs (both) are the intersection of the corresponding original ones.
- Parameters:
other_vertical_list (
subgroups.data_structures.vertical_list_with_bitsets.VerticalListWithBitsets
) – the Vertical List with which to make the join.quality_measure (
subgroups.quality_measures.quality_measure.QualityMeasure
) – the quality measure which is used to compute the quality value of the created Vertical List.dict_of_parameters (
dict
[str
,typing.Union
[int
,float
]]) – python dictionary which contains all needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the created Vertical List, not of the dictionary of parameters passed by parameter.return_None_if_n_is_0 (
bool
) – if the subgroup parameter n (i.e., tp + fp) of the resulting Vertical List (i.e., the join) is 0, this means that both sequence of instances are empty and, therefore, this means that the pattern represented by the Vertical List is not in any instance in the dataset. If the parameter ‘return_None_if_n_is_0’ is True, None will be returned instead of a Vertical List object. By default, this parameter is False.
- Return type:
typing.Optional
[subgroups.data_structures.vertical_list_with_bitsets.VerticalListWithBitsets
]- Returns:
a new Vertical List as a result of the join of this Vertical List (self) and ‘other_vertical_list’.
- property n: int
The number of dataset instances which are covered by the selectors (‘list_of_selectors’), no matter the target.
- property sequence_of_instances_fp: bitarray
The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.
- property sequence_of_instances_tp: bitarray
The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.
- property tp: int
The number of dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.
subgroups.data_structures.vertical_list_with_sets module
This file contains the implementation of a Vertical List data structure whose sequences are implemented using python sets.
- class subgroups.data_structures.vertical_list_with_sets.VerticalListWithSets(list_of_selectors, sequence_of_instances_tp, sequence_of_instances_fp, number_of_dataset_instances, quality_value)[source]
Bases:
VerticalList
This class represents a Vertical List data structure whose sequences are implemented using python sets.
- Parameters:
list_of_selectors (
list
[subgroups.core.selector.Selector
]) – the list of selectors represented by the Vertical List.sequence_of_instances_tp (
collections.abc.Collection
[int
]) – the sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target. The number of elements in this sequence would be the true positives tp of the equivalent subgroup with the same list of selectors and with the same target.sequence_of_instances_fp (
collections.abc.Collection
[int
]) – the sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target. The number of elements in this sequence would be the false positives fp of the equivalent subgroup with the same list of selectors and with the same target.number_of_dataset_instances (
int
) – number of instances of the dataset.quality_value (
typing.Union
[int
,float
]) – the Vertical List quality value.
- compute_quality_value(quality_measure, dict_of_parameters)[source]
Method to compute the Vertical List quality value using the dictionary of parameters passed by parameter. This method uses the parameters ‘tp’ and ‘fp’ of the Vertical List, not of the dictionary of parameters passed by parameter. IMPORTANT: this method does not modify the Vertical List.
- Parameters:
quality_measure (
subgroups.quality_measures.quality_measure.QualityMeasure
) – the quality measure which is used.dict_of_parameters (
dict
[str
,typing.Union
[int
,float
]]) – python dictionary which contains all needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the Vertical List, not of the dictionary of parameters passed by parameter.
- Return type:
float
- Returns:
the computed value for the Vertical List quality value.
- property fp: int
The number of dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.
- join(other_vertical_list, quality_measure, dict_of_parameters, return_None_if_n_is_0=False)[source]
Method to create a new Vertical List as a result of the join of two Vertical Lists. The join of two Vertical Lists implies the following: (1) the last selector of the list of selectors of the second Vertical List is added to the end of the list of selectors of the first Vertical List, and (2) the new sequences of IDs (both) are the intersection of the corresponding original ones.
- Parameters:
other_vertical_list (
subgroups.data_structures.vertical_list_with_sets.VerticalListWithSets
) – the Vertical List with which to make the join.quality_measure (
subgroups.quality_measures.quality_measure.QualityMeasure
) – the quality measure which is used to compute the quality value of the created Vertical List.dict_of_parameters (
dict
[str
,typing.Union
[int
,float
]]) – python dictionary which contains all needed parameters with which to compute the Vertical List quality value. IMPORTANT: this method uses the ‘tp’ and ‘fp’ parameters of the created Vertical List, not of the dictionary of parameters passed by parameter.return_None_if_n_is_0 (
bool
) – if the subgroup parameter n (i.e., tp + fp) of the resulting Vertical List (i.e., the join) is 0, this means that both sequence of instances are empty and, therefore, this means that the pattern represented by the Vertical List is not in any instance in the dataset. If the parameter ‘return_None_if_n_is_0’ is True, None will be returned instead of a Vertical List object. By default, this parameter is False.
- Return type:
typing.Optional
[subgroups.data_structures.vertical_list_with_sets.VerticalListWithSets
]- Returns:
a new Vertical List as a result of the join of this Vertical List (self) and ‘other_vertical_list’.
- property n: int
The number of dataset instances which are covered by the selectors (‘list_of_selectors’), no matter the target.
- property sequence_of_instances_fp: set[int]
The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’), but not by the target.
- property sequence_of_instances_tp: set[int]
The sequence of IDs of the dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.
- property tp: int
The number of dataset instances which are covered by the selectors (‘list_of_selectors’) and also by the target.