API reference

class pymurtree.OptimalDecisionTreeClassifier.OptimalDecisionTreeClassifier(time: int = 600, max_depth: int = 3, max_num_nodes: int | None = None, sparse_coefficient: float = 0.0, verbose: bool = False, all_trees: bool = False, incremental_frequency: bool = True, similarity_lower_bound: bool = True, node_selection: int = 0, feature_ordering: int = 0, random_seed: int = 3, cache_type: int = 0, duplicate_factor: int = 1)

OptimalDecisionTreeClassifier is a class that represents a PyMurTree model.

depth() int

Returns the depth of the tree.

Parameters:

None

Returns:

int

Return type:

The depth of the tree.

export_dot(out_file: str = '', feature_names: ndarray | None = None, class_names: Dict[int, str] | None = None) None

Export the decision tree in DOT format for visualization with Graphviz. DOT representation is written to out_file if given, otherwise it is displayed on screen (standard output)

Parameters:
  • out_file ((str, optional)) – Name of the output file.

  • feature_names ((numpy.ndarray, optional)) – 1D Numpy array that represents the names of the features.

  • class_names ((dict, optional)) – Dictionary with int keys and str values that represent the class names.

Return type:

None

Raises:

ValueError – If fit method has not been called (self.__tree is None):

export_text(out_file: str = '', feature_names: ndarray | None = None, class_names: Dict[int, str] | None = None) None

Creates a text representation of all the rules in the decision tree. Text is written to out_file if given, otherwise it is displayed on screen (standard ouput).

Parameters:
  • out_file ((str, optional)) – Name of the output file.

  • feature_names ((numpy.ndarray, optional)) – 1D Numpy array that represents the names of the features.

  • class_names ((dict, optional)) – Dictionary with int keys and str values that represent the class names.

Return type:

None

Raises:

ValueError – If fit method has not been called (self.__tree is None):

fit(x: ndarray, y: ndarray, time: int | None = None, max_depth: int | None = None, max_num_nodes: int | None = None, sparse_coefficient: float | None = None, verbose: bool | None = None, all_trees: bool | None = None, incremental_frequency: bool | None = None, similarity_lower_bound: bool | None = None, node_selection: int | None = None, feature_ordering: int | None = None, random_seed: int | None = None, cache_type: int | None = None, duplicate_factor: int | None = None) None

Fits a PyMurTree model to the given training data.

Parameters:
  • x ((numpy.ndarray)) – A 2D array that represents the input features of the training data.

  • y – (numpy.ndarray): A 1D array that represents the target variable of the training data.

  • time ((int, optional)) – The maximum time budget in seconds allowed for fitting the model. Defaults to None.

  • max_depth ((int, optional)) – The maximum depth of the trees in the ensemble. Defaults to None.

  • max_num_nodes ((int, optional)) – The maximum number of nodes for each tree in the ensemble. Defaults to None.

  • sparse_coefficient ((float, optional)) – The sparsity coefficient used for tree pruning. Defaults to None.

  • verbose ((bool, optional)) – If True, prints the progress of the training process. Defaults to None.

  • all_trees ((bool, optional)) – If True, returns all trees generated during the training process. Defaults to None.

  • incremental_frequency ((bool, optional)) – If True, uses incremental frequency counting. Defaults to None.

  • similarity_lower_bound ((bool, optional)) – If True, uses similarity lower bound pruning. Defaults to None.

  • node_selection ((int, optional)) – The method used for node selection. Defaults to None.

  • feature_ordering ((int, optional)) – The method used for feature ordering. Defaults to None.

  • random_seed ((int, optional)) – The random seed for the training process. Defaults to None.

  • cache_type (int, optional) – The type of cache used for storing the intermediate results. Defaults to None.

  • duplicate_factor (int, optional) – The duplicate factor used for parallelization. Defaults to None.

Return type:

None

Raises:

ValueError – If x or y is None or if they have a different number of rows.:

Examples

>>> model = PyMurTree()
>>> x_train = np.array([[1, 2], [3, 4]])
>>> y_train = np.array([0, 1])
>>> model.fit(x_train, y_train)
num_nodes() int

Returns the number of nodes in the tree.

Parameters:

None

Returns:

int

Return type:

The number of nodes in the tree.

predict(x: ndarray) ndarray

Predicts the target variable for the given input features.

Parameters:

x (numpy.ndarray) (A 2D array that represents the input features of the test data.) – Each row corresponds to an instance, and each column corresponds to a feature.

Returns:

numpy.ndarray – The i-th element in this array corresponds to the predicted target variable for the i-th instance in x.

Return type:

A 1D array that represents the predicted target variable of the test data.

score() int

Returns the misclassification score of the tree.

Parameters:

None

Returns:

int

Return type:

The misclassification score of the tree.