API reference

class pymurtree.OptimalDecisionTreeClassifier.OptimalDecisionTreeClassifier(time: int = 600, max_depth: int = 3, max_num_nodes: int | None = None, sparse_coefficient: float = 0.0, verbose: bool = False, all_trees: bool = False, incremental_frequency: bool = True, similarity_lower_bound: bool = True, node_selection: int = 0, feature_ordering: int = 0, random_seed: int = 3, cache_type: int = 0, duplicate_factor: int = 1)

OptimalDecisionTreeClassifier is a class that represents a PyMurTree model.

depth() → int

Returns the depth of the tree.

Parameters:: None
Returns:: int
Return type:: The depth of the tree.

export_dot(out_file: str = '', feature_names: ndarray | None = None, class_names: Dict[int, str] | None = None) → None

Export the decision tree in DOT format for visualization with Graphviz. DOT representation is written to out_file if given, otherwise it is displayed on screen (standard output)

Parameters:

out_file ((str, optional)) – Name of the output file.
feature_names ((numpy.ndarray, optional)) – 1D Numpy array that represents the names of the features.
class_names ((dict, optional)) – Dictionary with int keys and str values that represent the class names.

Return type:

None

Raises:

ValueError – If fit method has not been called (self.__tree is None):

export_text(out_file: str = '', feature_names: ndarray | None = None, class_names: Dict[int, str] | None = None) → None

Creates a text representation of all the rules in the decision tree. Text is written to out_file if given, otherwise it is displayed on screen (standard ouput).

Parameters:

out_file ((str, optional)) – Name of the output file.
feature_names ((numpy.ndarray, optional)) – 1D Numpy array that represents the names of the features.
class_names ((dict, optional)) – Dictionary with int keys and str values that represent the class names.

Return type:

None

Raises:

ValueError – If fit method has not been called (self.__tree is None):

Fits a PyMurTree model to the given training data.

Parameters:

x ((numpy.ndarray)) – A 2D array that represents the input features of the training data.
y – (numpy.ndarray): A 1D array that represents the target variable of the training data.
time ((int, optional)) – The maximum time budget in seconds allowed for fitting the model. Defaults to None.
max_depth ((int, optional)) – The maximum depth of the trees in the ensemble. Defaults to None.
max_num_nodes ((int, optional)) – The maximum number of nodes for each tree in the ensemble. Defaults to None.
sparse_coefficient ((float, optional)) – The sparsity coefficient used for tree pruning. Defaults to None.
verbose ((bool, optional)) – If True, prints the progress of the training process. Defaults to None.
all_trees ((bool, optional)) – If True, returns all trees generated during the training process. Defaults to None.
incremental_frequency ((bool, optional)) – If True, uses incremental frequency counting. Defaults to None.
similarity_lower_bound ((bool, optional)) – If True, uses similarity lower bound pruning. Defaults to None.
node_selection ((int, optional)) – The method used for node selection. Defaults to None.
feature_ordering ((int, optional)) – The method used for feature ordering. Defaults to None.
random_seed ((int, optional)) – The random seed for the training process. Defaults to None.
cache_type (int, optional) – The type of cache used for storing the intermediate results. Defaults to None.
duplicate_factor (int, optional) – The duplicate factor used for parallelization. Defaults to None.

Return type:

None

Raises:

ValueError – If x or y is None or if they have a different number of rows.:

Examples

>>> model = PyMurTree()
>>> x_train = np.array([[1, 2], [3, 4]])
>>> y_train = np.array([0, 1])
>>> model.fit(x_train, y_train)

num_nodes() → int

Returns the number of nodes in the tree.

Parameters:: None
Returns:: int
Return type:: The number of nodes in the tree.

predict(x: ndarray) → ndarray

Predicts the target variable for the given input features.

Parameters:: x (numpy.ndarray) (A 2D array that represents the input features of the test data.) – Each row corresponds to an instance, and each column corresponds to a feature.
Returns:: numpy.ndarray – The i-th element in this array corresponds to the predicted target variable for the i-th instance in x.
Return type:: A 1D array that represents the predicted target variable of the test data.

score() → int

Returns the misclassification score of the tree.

Parameters:: None
Returns:: int
Return type:: The misclassification score of the tree.