
Discussion of methods
This study evaluates both pre-trained models (ResNet152V2, EfficientNetV2B0, InceptionResNetV2, and MobileNetV3) and a custom-designed CNN (FCDS-CNN) for skin lesion classification. The pre-trained models offer diverse architectures, balancing accuracy, efficiency, and computational cost. ResNet152V2 provides high accuracy but demands more resources, while EfficientNetV2B0 and MobileNetV3 prioritize efficiency. InceptionResNetV2 excels in feature extraction. However, these models may not be optimally tuned for dermoscopic images. Therefore, FCDS-CNN is proposed, aiming for high accuracy with a less complex architecture tailored specifically for skin cancer detection. Comparing these models allows us to assess the trade-offs between performance, computational demands, and the benefits of task-specific model design. Key considerations include accuracy, computational cost, convergence speed, and handling class imbalance. This multifaceted approach aims to identify the most suitable architecture for skin cancer detection from dermoscopic images.
Data collection and preprocessing
Identification of skin cancer is one of the most vital areas of study in medical science, and picture dermoscopy is used to provide features of skin lesions that are suitable for machine learning. This study uses a dataset of 10,015 dermoscopic images across seven classes. In the H&E stained sections, the following findings were classified: 66 actinic keratoses (AK), 42 basal cell carcinomas (BCC), 39 benign keratoses (BKL), 50 dermatofibromas (DF), 73 melanocytic nevi (NV), 17 melanomas (MEL), and 14 vascular lesions (VASC). These classes are as follows, ranging from harmless to life-threatening cancers like melanoma. Another problem of this dataset is that the number of samples in different classes differs considerably which is a challenging factor in improving the model recognition rate. Through the study, Hosny et al. found a way in an article about melanoma’s aggression that the disease should be detected early. Esteva et al.40 thus underline that the quality of dermoscopic images should be enhanced to enhance identification since the lesions under examination may look alike. According to Bansal et al.44, the classification can be improved with more sophisticated approaches such as transfer learning and ensemble methods. However, these are some of the challenges encountered. Such findings provide background knowledge on the importance of using complex methods in skin cancer diagnosis.
The study identifies various skin diseases, including Akie, BCC, BKL, DDF, Mel, Nv, and Vasc, each with unique characteristics. The study emphasizes the importance of using advanced methods to accurately identify these diseases, using a variety of dermoscopic images, as shown Fig. 1a-g.
Dataset distribution
Figure 2 below illustrates the distribution of images within the dataset, divided into two key categories: the training and testing datasets. The training dataset, represented in light coral, accounts for 90% of the total images, equating to 9,013 images. This substantial portion is utilized to train the machine learning models, enabling them to learn the patterns, features, and characteristics associated with different classes of skin lesions. Conversely, the testing dataset, shown in the light green color, comprises 10% of the total images, or 1,002 images. This smaller group tests how well the model performs, ensuring it has learned and can handle new examples it hasn’t seen before. This distribution reveals a common way of working with machine learning: a preponderance of the data in the training bucket and a fairly small bit designed for testing. The key to this balance is so that we get reliable results and that the model doesn’t underfit as much as overfit. Figure 2 shows the distribution between the training and the testing datasets and Table 2 shows dataset Overview such as Image Types, Diameter, Training and Testing Distribution.
Distribution chart of training and testing dataset.
Classes distribution
This study proposed the investigation of skin cancer detection using the dataset categorized into seven classes: akie, df, nv, mel, vasc, bcc, and bkl. As visualized through the pie charts, the dataset reveals a significant class imbalance, with the NV class dominating both the training Fig. 9 (66.9%) and testing Fig. 10 (67.0%) datasets. Classes like mel and bkl are moderately represented, while akie and df are particularly scarce, complicating model training and evaluation. Despite this imbalance, the dataset is valuable due to its diverse representation of skin lesion types, providing a robust foundation for developing models capable of distinguishing between various conditions. The consistent distribution between training and testing sets ensures that models are evaluated under realistic conditions. However, the imbalance poses a risk of model bias toward the majority class. Implementing strategies like data augmentation, class weighting, or oversampling is crucial to ensure balanced and accurate detection across all classes. Figure 3a-b explains the classes on how many images are in the training dataset and testing dataset as shown in the figure.
Distribution of classes in training and testing dataset.
Preprocessing
Data preprocessing is one of the crucial steps one must perform while deep mastering, specifically if the input statistics are photographs and the sole intention is to perform a type. A technique that encompasses special strategies to carry raw records in a shape suitable for feeding to neural networks. Preprocessing is a critical step in that it improves the niceness of the statistics, eliminates noise, and makes the dataset standardized, enhancing the overall performance and accuracy of the version. For our pre-trained models as ResNet152V2, MobileNetV3, InceptionResNetV2, and EfficientNet, the preprocessing normally involves resizing all our pictures right into a not-unusual size of 224 pixels by way of 224 pixels. This resizing facilitates keeping the input facts aligned to the input dimensions desired in those models, allowing functional training and function computation. Other preprocessing steps likewise encompass normalization, which scales the pixel values. In effect, the fashions converge, and the schooling is made to beautify the models’ generalizability. We, therefore, need to preprocess our input statistics to maintain the expectations of pre-trained fashions that allow you to get greater reliable pores and skin cancer detection outcomes.
Data preprocessing included resizing all images to a common size of 224×224 pixels to match the input requirements of the pre-trained models (ResNet152V2, MobileNetV3, InceptionResNetV2, and EfficientNet). This resizing ensured consistency and facilitated efficient computation during training. Normalization was also performed, further scaling the pixel values to a specific range (e.g., 0-1) to standardize the dataset and aid in model convergence. For the FCDS-CNN, the input images were resized to 224×224 pixels, maintaining consistency across all models.
Class weighting
Another technique used in deep learning to address the problem of working with an imbalanced dataset is class weighting, which aims to give more weight to the minority classes during training to cause the model to spend more time training to reduce bias towards the majority classes. This was implemented by calculating the inverse frequency of each class in the training set and using these values as weights during model training. The class weighting is applied to FCDS-CNN models and pre-trained models such as ResNet152V2, MobileNetV3, InceptionResNetV2, and EfficientNet, particularly for skin cancer detection, where some types of cancer are rare. The technique enhanced the recollectiveness and exactness of minority classes in all the models, which boosted the further prediction adequateness. The main benefit of these models is that social class weighting brings out the minor types of cancer and, at the same time, overlooks very crucial cases, leading to a generally high performance of the model. All the models were found to have benefited from this particular approach. Still, the FCDS-CNN led to rather substantial improvements, indicating that class weighting is quite effective for improving the accuracy and reliability of models for skin cancer detection across different classes.
Data augmentation
These two data augmentation techniques are essential in deep learning, as they enlarge the variety of training data sets through rotating, flipping, scaling, and zooming through existing pictures to increase the capability of the model to generalize on an actual data set. These transformations were applied randomly to the training images using a library like TensorFlow’s ImageDataGenerator, with specific ranges chosen empirically (e.g., ±20-degree rotation, 0.9-1.1 scaling). No synthetic data generation was used. Data augmentation is crucial for the dataset used in this study, where some classes are under-represented, such as the skin cancer detection dataset used in this study. Due to data augmentation techniques, more samples can be produced for the minority classes, reducing the model’s ability to learn only from the samples of more common courses. The above approach proves highly practical for both the FCDS-CNN and the pre-trained models such as ResNet152V2, MobileNetV3, InceptionResNetV2, and EfficientNet. Data augmentation can enhance the precision of skin cancer prediction using the model by representing all the classes in the training process.
Hyperparameter tuning and training details
The Adam optimizer was employed to train all models due to its effectiveness in handling sparse gradients and adaptive learning rate capabilities. The initial learning rate was set to 0.001, and a learning rate schedule (e.g., ReduceLROnPlateau) was implemented to dynamically adjust the learning rate during training based on validation performance. This helped to fine-tune the learning process and prevent premature convergence at local minima. The categorical cross-entropy loss function was utilized as it is suitable for multi-class classification problems. A batch size of 32 was chosen empirically, balancing training speed and memory usage based on the available computational resources (16GB RAM, Intel Core i7, using Google Colab for training with access to GPUs). The dataset was randomly split into training (90%) and validation (10%) sets, ensuring a stratified distribution of classes in both sets to mitigate potential bias. The number of training epochs varied depending on the model, ranging from 30 to 40, determined using early stopping based on validation accuracy. Specifically, training was halted if the validation accuracy did not improve for several consecutive epochs (e.g., 5), preventing overfitting and ensuring optimal generalization performance. For FCDS-CNN, data augmentation techniques, including [list specific techniques with ranges: e.g., random rotations (±20 degrees), horizontal/vertical flips, random zoom (0.9-1.1)] were implemented using TensorFlow’s ImageDataGenerator. Class weights, calculated as the inverse class frequency, were applied during training to address the class imbalance issue, as detailed in Section 3.5. The models were implemented using TensorFlow [version number] and Keras. Model performance was evaluated using accuracy, precision, recall, F1-score, and Area Under the Curve (AUC), calculated on the held-out test set (10% of the data). These metrics comprehensively evaluate the models’ classification performance, particularly concerning sensitivity and specificity.
Proposed deep learning-based models
This research proposed five deep learning models for skin cancer detection, including four pre-trained models: The models involved in the study are ResNet152V2, MobileNetV3, EfficientNetV2B0, InceptionResNetV2, and FCDS-CNN, which was created from scratch. The FCDS-CNN utilizes a softmax activation function in its output layer for multi-class classification. When fine-tuned using domain-specific HAM datasets, ResNet50 and other models have better-classified skin lesions than traditional machine learning models45. It has also been widely used as transfer learning where rather than training the network from scratch, we fine-tune a network that has already been trained on a large dataset, another approach that has been successful in the case of skin cancer detection, especially when dealing with small datasets to get better results46. While exploring different activation functions could be a valuable extension, this study focused on comparing different architectures for skin lesion classification. Machine learning, a further subcategory known as deep learning, has dramatically improved the performance of medical image classification, especially in skin disorders8. Some models, InceptionV3 and Xception, have been used in dermatological image classification, providing high accuracy, but they are computationally intensive, and the problem of overfitting was also observed47. The custom model was successfully developed, addressing issues with other methods, such as computational complexity and unbalanced data sets, and showing cost, speed, and reliability compared to the others. It can be highly efficient in providing doctors and patients with early skin cancer detection and is ideal for areas lacking funds to buy other equipment.
ResNet152V2
In this study, one of the strategies that was used for skin cancer detection was a pre-trained model called ResNet152V2. This is one of the deep learning architectures belonging to the ResNet family, which is recognized by its depth and performance capabilities when approaching challenging image classification problems. ResNet, partial Residual Networks, is the idea in which the concept of residual learning is used, which adopts identity shortcuts that skip one or many layers. This architecture makes it possible to train much deeper networks than would be the case when training standard deep networks that struggle with the vanishing gradient problem.
ResNet152V2, in particular, is a 152-layer deep network architecture that provides the architecture with high complexity to learn detailed features of the images. Due to this depth, it’s preferable for operations such as the analysis of medical images, where the slight difference between the images is significant in detecting the ailment. The benefits of using ResNet152V2 are its reasonable accuracy rate in handling different image classification issues, its ability to be easily trained on various datasets, and its backing of many articles and forums on fine-tuning and integrating the model. Nonetheless, it is also important to mention that ResNet152V2 has some limitations. Here are some of them: Its deep architecture is somewhat effective yet utilizes abundant computational resources, thus causing issues in contexts with lowered hardware performance. Also, the more intricate the model is, the longer it takes to provide an inference, and sometimes that may be undesirable regarding time sensitivity. These factors make it relatively slow compared to real-time applications or usage in environments with limited resources.
However, the custom-built FCDS-CNN, developed in this study, was designed explicitly for skin cancer detection to achieve a higher production rate of 96% compared to the 91% that was completed in the survey using ResNet152V2. The FCDS-CNN is the best in terms of performance as well as the use of resources; hence, it is favored most when operating and diagnosing quick and accurate results, especially in poorly funded contexts. In general, image classification tasks ResNet152V2 is proven to be better. However, the idea and flexibility behind the FCDS-CNN make it more suitable for this particular task and easier to implement in early skin cancer detection.
This model applied in the current study for skin cancer identification is explained from the architectural view and the process flow is depicted in Fig. 4. The figure above shows how the model works on input images of size 224×224×3 via The ResNet152V2, which is used here to extract latent semantic features. The output from ResNet152V2 is \(7\times 7 \times 2048\), followed by the Global Average Pooling layer resulting in 2048 units. After that, the increasing layers with units of 1,024 and 512 are deployed, wherein each layer is accompanied by a batch normalization layer and a dropout layer for enhanced learning and reduced overfitting. Lastly, the model provides prediction probabilities in seven classes by a final dense layer of seven neurons.
In the following, Table 3 shows the detailed architecture of the ResNet152V2 model along with the output shapes, layers, and numbers of parameters connected to each of them. The capacity note of ResNet152V2 shows that it has more than 60 million parameters, most of which are trained. The architecture ensures that ResNet152V2, which is much deeper and more complex than other architectures, is utilized in the dens extracting features from images, which are vital in classifying images in this medical imaging. However, it comprises more parameters, making it larger and showing that it is computationally intensive compared to the customized CNN model designed and created in this study.
Architecture of ResNet152V2 model.
EfficientNetV2B0
Another model we adopted as part of the approach to identifying skin cancer was a trained one known as EfficientNetV2B0. Specifically, this is a model of the EfficientNet series known for its high performance and relatively low computational costs. This model is designed to make it as accurate as possible with the smallest number of parameters and computations possible to render it amicable in applications where time and resource use are paramount. EfficientNetV2B0 is the most miniature model in the EfficientNetV2 family because it constitutes improvements in the network scaling technique that cut the computational cost while performing well on various image classification tasks. Likewise, one of the significant strengths of EfficientNetV2B0 is that it can be trained and used for faster inference than other complicated network architectures. This efficiency also makes it suitable for development in platforms with low computational power, such as in telecommunications or other tiny devices such as smartphones and embedded systems, all without much loss in accuracy. The operation of the model in a situation when the amount of available computational resources has been limited suggests that such a model may be used in the real-world setting where computational power is somewhat limited.
However, there are some factors that one will want to consider as limitations for using EfficientNetV2B0. Although the model works, its architecture is not very deep, so it will likely miss many details compared to the newer and deeper models, which may be very useful in highly technical processes such as skin cancer detection. This can lead to a reduction in accuracy, especially when it is essential to distinguish between near images of similar body organs. When using EfficientNetV2B0 the obtained result was 88% of accuracy, however, it is not as good as the custom CNN-based model.
The model introduced using FCDS-CNN in this research aims to provide a solution for skin cancer detection with a higher accuracy of ninety-six percent. Based on what was explained before, the FCDS-CNN was created from scratch, keeping in mind the peculiarities of dermoscopic images, which helped to detect more relevant features. Besides, data augmentation and class weighting made it even more accessible to train the FCDS-CNN on the overrepresented classes, which in the case of the dataset was 4, and increased overall accuracy. Primarily, EfficientNetV2B0 has significant benefits in terms of efficiency and significant speed; however, the FCDS-CNN offers better performance for the task’s goal of skin cancer detection, which is why it is more appropriate for this particular research.
The EfficientNetV2B0 architecture used in the current study is illustrated in Fig. 5 alongside the process flow. The figure below proves the model works as described: Identify the input layer that takes images of size 224×224×3. Then, the features are fed into the EfficientNetV2B0 backbone, in which several layers are applied to filter out the essential features, resulting in a \(7 \times 7\times 1280\) input. It is then reduced to a more manageable value of 1280 units via a Global Average Pooling layer. Looking at the next several dense layers containing 1024 and 256 neurons, they are expected to analyze all the extracted features for the final classification of skin cancer into any of the seven classes in the final dense layer. This information is described in detail in Table 4 below, which captures the model architecture of the EfficientNetV2B0; the output shapes the type of layers and the number of parameters at each stage. This table also shows the efficiency of the identified model as it achieved approximately 7.5 million parameters, and the majority of parameters are trainable. The architecture is aimed to strike deeper structures and computational complexity at the same time, which makes this model operate with relatively low requirements while ensuring a high accuracy rate.
Architecture of EfficientNetV2B0 model.
InceptionResNetV2
This study recommended using InceptionResNetV2, an out-of-the-box deep learning model developed by Inception’s family. It boasts of strength in using inception modules with residual connections, making it suitable for use on complicated experiments like skin cancer diagnosis. This work has achieved 93% accuracy by using InceptionResNetV2, which is proficient in depicting fine details of images due to its deep learning architecture and feature extraction system. Nevertheless, as with any technique, InceptionResNetV2 is not without its shortcomings: its high demand on computational and time resources and a long training and inference time, which become an issue in conditions of low availability of computing capabilities. Its advantage here as a more flexible and accurate tool for most generic image analysis tasks might thus be counterproductive for it to be used for specific application-oriented tasks wherein fast turnarounds and the least resource utilization are necessary. This paper proves by its FCDS-CNN model design, that it is more precise, practical, and efficient to detect early skin cancer with its focused design instead of using the InceptionResNetV2, which is more generalized.
Figure 6 depicts the architecture of the InceptionResNetV2 model used in this paper for skin cancer detection, which explains the various stages of the model. The figure below illustrates the model and work it takes as input the images of size 224×224×3, where InceptionResNetV2 is the backbone model that extracts the higher level features since it has a very high architectural design. The output from this backbone, with a dimension of 5×5×1536, is followed by layer Global Average Pooling to reduce it to 1536 units. The next is a series of fully connected layers 1024, to which batch normalization and dropout were added, which improved the model’s resilience and reduced overtraining. Lastly, the model generates the probabilities of classes for images belonging to seven categories through a dense layer with seven outputs.
Table 5 provides a comprehensive assessment of the InceptionResNetV2 structure, which includes the output shapes, layer kinds, and the number of parameters involved at every degree. The table highlights the version’s sizable potential, with over fifty-six million parameters, indicating its potential to handle complex and huge-scale photograph category duties. Despite the model’s depth and advanced design, which contribute to its excessive accuracy, the FCDS-CNN version advanced on this has a look turned mainly optimized for the specific necessities of skin cancer detection, resulting in even better performance.
Architecture of InceptionResNetV2 model.
MobileNetV3
This work has introduced the approach to skin cancer diagnosis using a pre-trained model known as MobileNetV3. This very light deep learning model has been developed mainly for mobile and embedded systems. It is based on the MobileNetV2 network that was released a little earlier, with the addition of improvements such as SE (squeeze-and-excitation) blocks and through elimination of the excess layers and complexity while retaining a high level of accuracy. MobileNetV3 is used significantly in environments with limited computational resources, and, therefore, it is suitable for applications where deep learning models are required on edge devices. MobileNetV3 has numerous advantages, such as the reduced number of parameters and the short time needed to make an inference, which is essential for real-time applications. Furthermore, the model provokes less load demand and entails fewer parameters than some traditional models, which is a strength in low-power situations. Although it has numerous advantages, the following are some of MobileNetV3’s limitations. Its simplicity, because it is optimized for quick training, might slightly sacrifice accuracy, especially since it is less adaptable than more profound and more complex models, for example, in medical imaging, which requires the identification of small patterns.
In the present experiment, a MobileNetV3 model yielded 90% accuracy, which is good but slightly lower than the other FCDS-CNN model. The types of models proposed in his study were also appropriate for skin cancer detection, so they can capture higher relevant and descriptive dermoscopic image features. Although MobileNetV3 excels through its efficiency and more versatile deployment, it is still outperformed by the FCDS-CNN model in terms of accuracy and optimization for the given task. From the experiment results, the FCDS-CNN model improves both accuracy and control of the computational time between the two state-of-the-art models, which makes it more applicable for early skin cancer detection.
Figure 7 summarises the architecture as well as the process flow of this model employed in this study for skin cancer detection. The figure illustrates the input Image dimensions of 224×224×3 with which the model processes the current images using the mobileNetV3 backbone. This backbone efficiently extracts features from the input by reducing the dimensions to 7×7×960. The lac output then is reduced to 960 units with the help of a Global Average Pooling layer. The final feature vector is then fed through dense layers with 1024 and 512 neurons, respectively, followed by batch normalization and dropout to enhance the stability of the network. Last but not least, the previous dense layer produces predictions in seven classes. Table 6 further elaborates the architecture of this model with the output shapes, the layer types, and the number of parameters in each stage. Therefore, this is an ideal place to use the custom CNN model rather than MobileNetV3 since its accuracy is higher and has been specially trained for this kind of task even though MobileNetV3 is better in efficiency and speed.
Architecture of MobileNetV3 model.
Proposed FCDS-CNN model
The proposed FCDS-CNN in this research contains several advantages over other research and pre-trained models, which are as follows. One of the recent approaches with higher accuracy in the automated classification of skin lesions has been FCDS-CNN, which indicates a great potential for enhancing diagnostic accuracy and early detection of skin cancers13. Firstly, the model is optimized for skin cancer detection, which leverages features inherent in dermoscopic images. Unlike standard CNNs trained on general image datasets like ImageNet, the FCDS-CNN is specifically designed to capture the unique characteristics of dermoscopic images relevant to skin cancer. This task-specific design allows it to learn more discriminative features for accurate lesion classification. This customization enables the technology to predict more accurately and more efficiently, particularly in diagnosing aggressive skin cancer at early stages. FCDS-CNN has been used in medical image analysis where pre-trained models through transfer learning have been used to get enhanced results in small data sets. FCDS-CNN and other deep learning algorithms have proven crucial in other complicated medical image analysis problems and bring improvements in the detection of skin cancer10. Furthermore, the FCDS-CNN architecture incorporates two key strategies to handle challenges common in medical image datasets. Prior research has shown that classification based on FCDS-CNN achieves high accuracy in skin lesion classification; nevertheless, data imbalance and model over-specialization to the datasets are some of the areas that require research to make the model clinically useful11. Specifically, data augmentation techniques, including rotation, flipping, scaling, and zooming, are employed to create variations of existing images, artificially expanding the dataset and reducing the impact of class imbalance. In addition, a class weighting mechanism is implemented during training to give more weight to under-represented classes, ensuring that the model learns to identify all lesion types even with skewed data distribution effectively. These strategies are crucial for improving the performance on minority classes and enhancing the overall robustness of the model. Besides, the model’s training time is relatively shorter than that of large explicit pre-trained models depending on resource consumption since it is a comparatively smaller model. Compared to complex pre-trained models with millions of parameters, the FCDS-CNN utilizes a streamlined architecture with optimized layers (as shown in Figure 8 and Table 6) and a reduced parameter count, increasing computational efficiency and making it suitable for deployment in resource-constrained settings. Incorporating batch normalization and dropout layers further contributes to regularization, preventing overfitting and improving overall model performance. This allows for its implementation and utilization in environments they may otherwise not be able to afford or support in terms of their fundamental hardware needs while still delivering the performance expected of state-of-the-art healthcare facilities; hence, the software can be implemented both in developed and developing world’s more advanced medical centers and raw, rudimentary environments, respectively. The use of data augmentation and class weighing achieves additional improvement. Augmentation increases the model’s capacity to classify other unseen data by decreasing the probability of sheer overfitting, mainly when operating on imbalanced datasets. Class weighting makes sure that, during training, the minority classes are given an enhanced representation as a way of avoiding overlooking some of the popular courses in favor of the rare classes, thus making the diagnoses far more accurate with all kinds of skin cancer.
Besides, the structures composing the model are scalable, which means they can be fine-tuned for new datasets and different medical imaging applications. This model’s versatility expands its applicability for future studies or diseases without developing a new model from scratch. These unique attributes of the FCDS-CNN model make it a strong, flexible, and feasible tool in medical imaging in diagnosis and can provide significant improvements over conventional methods.
Figure 8 The architecture of the custom-built FCDS-CNN for skin cancer detection is shown in Figure. Initially, it shows a detailed topographic view of the data streams from the input layer traveling through various convolutional, batch normalization, and pooling layers to the dense layers interconnected with the output layer streams. The model stands to take this set of images as input, reshaping the data into a set of features the model then uses for the classification of skin cancer to one of the seven categories. The detailed structure of the FCDS-CNN, which includes the output shapes, the type of layers, and the number of parameters for each layer, are listed in Table 7. By bringing it all in a table, we show the model’s hierarchical structure, how intricate and deep a network is under this model, and extreme minimalism in how many parameters to use to achieve such high accuracy. However, this work shows how the model can find out the complex pattern on the dermoscopic images and be reasonably fast simultaneously.
Illustration of the FCDS-CNN model architecture.
Illustration of the all models.
Evaluation parameter
In the study, Eq. (1) is used to assess the accuracy of each ML Model. Accuracy is defined as the nearness of measurement to the actual value being sought and is a key metric in calculating the models\(^{?}\).
$$\begin{aligned} Accuracy=\frac{{Tr Pos}+{Tr Neg}}{{Tr Pos}+{Fa Pos}+{Tr Neg}+{Fa Pos}} \end{aligned}$$
(1)
Where:
$$\begin{aligned} Tr Pos&= \text {True Positives} \\ Tr Neg&= \text {True Negatives} \\ Fa Pos&= \text {False Positives} \\ \end{aligned}$$
In these circumstances, Eq. (2) is used to calculate the Precision value, a measure of the agreement between the multiple measurements of the same quantity. It is defined by the number of True Positive (TrPos) and False Positive (FaPos) results to determine precision, capable of recognizing positive instances correctly. The higher precision means less variation in the outcome, so they are very reproducible and reliable.
$$\begin{aligned} Precision=\frac{{Tr Pos}}{{Tr Pos}+{Fa Pos}} \end{aligned}$$
(2)
Where: Tr Pos = True Positives, Fa Pos = False Positives
In particular, Eq. (3) defines recall as one of the model’s capacity to discern all of the relevant instances in a data stream. The effect is the recall, which is the sum of the true positives (correct identifications) minus false negatives (false alarms). This higher recall value indicates that the model can produce more positive instances and thus can better find out the true positives.
$$\begin{aligned} Recall=\frac{{Tr Pos}}{{Tr Pos}+{Fa Neg}} \end{aligned}$$
(3)
Where: Tr Pos= True Positives, Fa Neg = False Negatives
As it were, Eq. (4) represents the F1 Score computed with harmonic mean of recall and precision. Given one assigned utterance to a sentence, it is a symmetrical metric, one that considers how effectively the model can identify relevant examples (recall) and how accurately it does so. The harmonic mean combines these metrics to see if the model is effective on both precision and recall.
$$\begin{aligned} F1Score= 2 \times \frac{{Precision} \times {Recall}}{{precision}+{Recall}} \end{aligned}$$
(4)
Where:
Tr Pos = True Positives,
Fa Pos = False Positives,
Fa Nag = False Negatives