Machine learning (ML). Training mathematical models with an algorithm Decision Trees regression and classification methods

Button [Training and applying a mathematical model using the decision tree method (regression and classification).]

Decision trees are classified as supervised machine learning (ML) algorithms and are used to predict both continuous (regression) and categorical (classification) output variables. This feature of our software makes machine learning technology accessible to a wide range of users.

You can download an example of a structured spreadsheet file for creating a mathematical model and predicting a Decision Tree algorithm for regression analysis: XLSX and for classification XLSX .

Structured data from table files can be used for import: Excel workbook (*.xlsx); Excel binary workbook (*.xlsb); OpenDocument Spreadsheet (*.ods).

Where is it used?

Data analysis using the decision tree method can be used:

as an effective (cost, time, resources) alternative" Planning experiments "to search for optimal modes of input parameters;
for preliminary or alternative assessment of output parameters when measurement procedures for such parameters are carried out by expensive and/or time-consuming tests;
for expert decision support systems (DSS), when decisions are associated with the risk of human errors.

Data Model Files

Our software can use trained mathematical models of Decision Trees for the scikit-learn library, created on other computers and saved in files (*.sav).

Decision trees by regression method for continuous quantities (measurements) at the input and output

Example use from one of our clients:
You manage the design development and assembly production, and order parts from a large metalworking center. The number of requests to calculate the cost of a metal center significantly exceeds the number of actual orders from it. The managers of the metal center are already reluctant and late in responding to your requests. You offer the metal center to give you a calculation algorithm so that you can quickly calculate the cost of the metal center’s work without distracting its employees from work, but, naturally, you receive a refusal.

The history of your orders with the quantity, technical characteristics of parts (which are the basis for calculating the cost of metal center services) and the provided cost is an excellent basis for creating a regression model and using it to independently obtain very close metal center prices without sending requests for calculation. Machine learning function Decision trees by software regression Shewhart control charts +AI will demonstrate an assessment of the accuracy of the mathematical model when building it. A graphical analysis of the error of the metal center prices predicted using the constructed mathematical model “Current vs. Predicted values” will show you an assessment of possible risks, both in the “dangerous” and “safe” directions, which you can take into account in your pricing. To update your mathematical model, you will be able to supplement it with orders that will actually reach execution in the metal center.

Window for jumping to machine learning (ML) functions

Figure 1. Window for accessing machine learning (ML) functions. A list of drop-down menus is displayed when you hover the mouse over the main menu item.

Figure 2. Machine learning (ML) functions window. A tooltip is displayed when you hover the mouse over the button to go to the functions of decision trees (regression and classification).

Window for transition to the functions of applying machine learning algorithms using decision tree methods (regression and classification).

Figure 3. Window for transition to functions for managing machine learning algorithms using decision trees (regression and classification). A drop-down tooltip appears when you hover your mouse over the button to go to the decision tree algorithms control panel (regression).

Figure 4. Window of the control function for the machine learning algorithm using the decision tree method (regression). A drop-down list is opened to select the predicted variable.

Window of the control function for the machine learning algorithm using the decision tree method (regression)-2.

Figure 5. Window of the control function for the machine learning algorithm using the decision tree method (regression). The checkbox for removing restrictions on the depth of the decision tree is ticked. The checkbox is checked to save the model when changing model parameters in the corresponding application folder (SCCPython\resources\Model_AI).

Figure 6. Window of the control function for the machine learning algorithm using the decision tree method (regression). A drop-down list with types of mathematical model evaluation graphs is opened. The plot area displays the "Actual vs. Predicted Values" graph for the test data set.

Window of the function for controlling the machine learning algorithm using the decision tree method (regression)-4.

Figure 7. Window of the function for controlling the application of the mathematical model of the decision tree (regression). The graph is scaled on the X axis to show fewer points (from 140 to 196) using the Zoom tool below the graph. A drop-down tooltip is displayed when you hover over the button to go to the function of selecting a trained mathematical model for applying it to new data selected in the following steps.

Window of the function for controlling the machine learning algorithm using the decision tree method (regression)-5.

Figure 8. Window of the function for managing the selection of the mathematical model of the Decision Tree (regression). A drop-down tooltip is displayed when you hover over the field with the path to the selected trained mathematical model.

Window of the function for controlling the machine learning algorithm using the decision tree method (regression)-7.

Figure 9. Window of the function for managing the selection of the mathematical model of the Decision Tree (regression). A drop-down tooltip is displayed when you hover the cursor over the button to go to the function of selecting data to use in a mathematical model.

Window of the function for controlling the machine learning algorithm using the decision tree method (regression)-8.

Figure 10. Window of the function for managing the selection of a file with data and applying the mathematical model of the Decision Tree (regression) to them. A drop-down tooltip appears when you hover over the "Predict Results" button.

Window of the function for controlling the machine learning algorithm using the decision tree method (regression)-9.

Figure 11. Window for controlling the application of a decision tree mathematical model (regression) to imported data. By clicking on the "Predict results" button, the model is applied to the imported data and upon completion of the operation, a notification window opens to save the predicted values in an Excel file with the source data.

If your imported data contains one or more explanatory variable columns with categorical values, such as [male, female], an automatic One-Hot Encoding procedure will be performed to convert the data into new numeric coded columns [0, 1]. The hot encoded data will be saved in the original [xlsx] file in a new sheet.

Reasons why the accuracy of a mathematical model using the Decision Tree (regression) method can give low accuracy

Limited Data: If the input data to a model is limited or contains insufficient information, the model may experience insufficient data to produce an accurate predictive model.
Incorrect feature selection: If inappropriate or irrelevant features are included in the model, it may affect the accuracy of the model. Selecting the right features and cleaning the data from outliers and noise is very important to achieve high accuracy of the regression model.
Undertraining: If a model is not trained long enough or is not complex enough to approximate complex relationships in the data, it may produce poor prediction accuracy. In such cases, it may be necessary to increase the depth of the decision tree or use other machine learning techniques.
Overfitting: If a model has too many parameters or a decision tree that is too deep, it may overfit on the training data and perform poorly on the new data. One way to combat overfitting is to use regularization, such as bobbing or constraining model parameters.
Unbalanced data: If the training data set contains an uneven number of examples of target variable values, this may result in poor model accuracy. In such cases, example weighting techniques may need to be used.
Noise in the data: Noise or random outliers in the data can cause the regression model to have low accuracy. It is necessary to conduct preliminary data analysis and remove outliers, as well as apply methods to reduce the influence of noise, such as smoothing or filtering the data.

Decision trees by classification method for continuous quantities (measurements) as input and categorical data (classes) as output

Example 1. Based on the results of the patient’s clinical tests, it is necessary to make a decision on his diagnosis, for example, sick/not sick.

Example 2. It is necessary to draw a conclusion about the belonging of an object or event to a specific class (type) based on the results of measurements of many of its characteristics (properties).

Window of the control function for the machine learning algorithm using the decision tree method (classification)-1.

Figure 12. Window of the function for managing training and evaluation of the mathematical model of the decision tree (classification). A drop-down tooltip is displayed when you hover the mouse over the button to go to the control panel for decision tree algorithms using the classification method.

Window of the control function for the machine learning algorithm using the decision tree method (classification)-2.

Figure 13. Window of the function for managing training and evaluation of the mathematical model of the decision tree (classification). The checkbox is checked to remove the limit on the depth of the decision tree. The checkbox is checked to save the model when changing model parameters in the corresponding application folder (SCCPython\resources\Model_AI). A drop-down list is displayed with a choice of types of evaluation graphs for the trained model when using test data that was not included in the training data set.

Window of the function for controlling the machine learning algorithm using the decision tree method (classification)-3.

Figure 14. Window of the function for managing training and evaluation of the mathematical model of the decision tree (classification) with graphs of the “confusion matrix”. A hint is displayed when you hover the mouse cursor over the button to go to the control panel for selecting a trained model to import data in the next step."

Window of the function for controlling the machine learning algorithm using the decision tree method (classification)-4.

Figure 15. Window of the function for selecting a trained mathematical model of the Decision Tree (classification) to user-selected data in the next step. A tooltip is displayed when you hover your mouse over the button to go to the data selection control panel for applying the selected trained model to it.

Window of the function for controlling the machine learning algorithm using the decision tree method (classification)-5.

Figure 16. Window of the function of applying a trained mathematical decision tree model (classification) to user-selected data. A tooltip appears when you hover your mouse over the "Predict Results" button.

Window of the function for controlling the machine learning algorithm using the decision tree method (classification)-6.

Figure 17. Window of the function of applying a trained mathematical decision tree model (classification) to user-selected data. By clicking on the "Predict results" button, the model is applied to the imported data and upon completion of the operation, a notification window opens to save the predicted values in an Excel file.

Window of the function for controlling the machine learning algorithm using the decision tree method (classification)-7.

Figure 18. Window of the function for managing training and evaluation of the mathematical model of the decision tree (classification). In the graph area, confusion matrices are displayed large - the second type of graphs for the Decision Tree (classification).

Reasons why the accuracy of a mathematical model using the Decision Tree (classification) method can give low accuracy

Insufficient amount of data: If the model is trained on a small amount of data, it may result in low accuracy. The more data available for training, the more accurate the model can be.
Inadequate feature selection: If inappropriate or irrelevant features are included in the model, it can reduce its accuracy. It is important to select those features that are most highly correlated with the target variable to achieve high classification accuracy.
Insufficient data preprocessing: Incorrect data processing, such as improper scaling or normalization, can lead to poor model accuracy. It is important to carry out the necessary data preprocessing steps, such as cleaning data from outliers or filling in missing values.
Model overtraining: If a model is too complex or has too many parameters, it may overtrain on the training data and perform poorly on the new data. Overfitting can be reduced, for example, by limiting the tree depth or using regularization.
Class Imbalance: If the classes in the data are imbalanced, that is, one class dominates the others, the model may tend to predict the dominant class and show low accuracy on less represented classes. In such cases, using class balancing techniques such as upsampling or downsampling can improve the accuracy of the model.
Incorrect selection of decision rules: If the decision rules that determine the distribution of classes in tree nodes are selected incorrectly, this can lead to low model accuracy. It is important to select appropriate decision rules that most accurately separate the classes.

Shewhart control charts PRO-Analyst +AI for Windows, Mac, Linux Register of Russian software (entry No. 18857 dated 09/05/2023)