API reference
- class pymurtree.OptimalDecisionTreeClassifier.OptimalDecisionTreeClassifier(time: int = 600, max_depth: int = 3, max_num_nodes: int | None = None, sparse_coefficient: float = 0.0, verbose: bool = False, all_trees: bool = False, incremental_frequency: bool = True, similarity_lower_bound: bool = True, node_selection: int = 0, feature_ordering: int = 0, random_seed: int = 3, cache_type: int = 0, duplicate_factor: int = 1)
OptimalDecisionTreeClassifier is a class that represents a PyMurTree model.
- depth() int
Returns the depth of the tree.
- Parameters:
None
- Returns:
int
- Return type:
The depth of the tree.
- export_dot(out_file: str = '', feature_names: ndarray | None = None, class_names: Dict[int, str] | None = None) None
Export the decision tree in DOT format for visualization with Graphviz. DOT representation is written to out_file if given, otherwise it is displayed on screen (standard output)
- Parameters:
out_file ((str, optional)) – Name of the output file.
feature_names ((numpy.ndarray, optional)) – 1D Numpy array that represents the names of the features.
class_names ((dict, optional)) – Dictionary with int keys and str values that represent the class names.
- Return type:
None
- Raises:
ValueError – If fit method has not been called (self.__tree is None):
- export_text(out_file: str = '', feature_names: ndarray | None = None, class_names: Dict[int, str] | None = None) None
Creates a text representation of all the rules in the decision tree. Text is written to out_file if given, otherwise it is displayed on screen (standard ouput).
- Parameters:
out_file ((str, optional)) – Name of the output file.
feature_names ((numpy.ndarray, optional)) – 1D Numpy array that represents the names of the features.
class_names ((dict, optional)) – Dictionary with int keys and str values that represent the class names.
- Return type:
None
- Raises:
ValueError – If fit method has not been called (self.__tree is None):
- fit(x: ndarray, y: ndarray, time: int | None = None, max_depth: int | None = None, max_num_nodes: int | None = None, sparse_coefficient: float | None = None, verbose: bool | None = None, all_trees: bool | None = None, incremental_frequency: bool | None = None, similarity_lower_bound: bool | None = None, node_selection: int | None = None, feature_ordering: int | None = None, random_seed: int | None = None, cache_type: int | None = None, duplicate_factor: int | None = None) None
Fits a PyMurTree model to the given training data.
- Parameters:
x ((numpy.ndarray)) – A 2D array that represents the input features of the training data.
y – (numpy.ndarray): A 1D array that represents the target variable of the training data.
time ((int, optional)) – The maximum time budget in seconds allowed for fitting the model. Defaults to None.
max_depth ((int, optional)) – The maximum depth of the trees in the ensemble. Defaults to None.
max_num_nodes ((int, optional)) – The maximum number of nodes for each tree in the ensemble. Defaults to None.
sparse_coefficient ((float, optional)) – The sparsity coefficient used for tree pruning. Defaults to None.
verbose ((bool, optional)) – If True, prints the progress of the training process. Defaults to None.
all_trees ((bool, optional)) – If True, returns all trees generated during the training process. Defaults to None.
incremental_frequency ((bool, optional)) – If True, uses incremental frequency counting. Defaults to None.
similarity_lower_bound ((bool, optional)) – If True, uses similarity lower bound pruning. Defaults to None.
node_selection ((int, optional)) – The method used for node selection. Defaults to None.
feature_ordering ((int, optional)) – The method used for feature ordering. Defaults to None.
random_seed ((int, optional)) – The random seed for the training process. Defaults to None.
cache_type (int, optional) – The type of cache used for storing the intermediate results. Defaults to None.
duplicate_factor (int, optional) – The duplicate factor used for parallelization. Defaults to None.
- Return type:
None
- Raises:
ValueError – If x or y is None or if they have a different number of rows.:
Examples
>>> model = PyMurTree() >>> x_train = np.array([[1, 2], [3, 4]]) >>> y_train = np.array([0, 1]) >>> model.fit(x_train, y_train)
- num_nodes() int
Returns the number of nodes in the tree.
- Parameters:
None
- Returns:
int
- Return type:
The number of nodes in the tree.
- predict(x: ndarray) ndarray
Predicts the target variable for the given input features.
- Parameters:
x (numpy.ndarray) (A 2D array that represents the input features of the test data.) – Each row corresponds to an instance, and each column corresponds to a feature.
- Returns:
numpy.ndarray – The i-th element in this array corresponds to the predicted target variable for the i-th instance in x.
- Return type:
A 1D array that represents the predicted target variable of the test data.
- score() int
Returns the misclassification score of the tree.
- Parameters:
None
- Returns:
int
- Return type:
The misclassification score of the tree.