Realistic implementation at the country level of artificial intelligence to detect cancer in the scanning of the mutilation on the basis of the population

The study was conducted on the organized breast cancer examination program in Germany targeting women without symptoms between the ages of 50 and 69 (Figure 1). All women participating in the examination program were qualified to include the study. Between July 1, 2021 and February 23, 2023, data from the examiners were collected from 12 examination sites that used the artificial intelligence system (extended spreadsheet 1). In the German breast photography examination program, which depends on the binding national guideline principle, four two -dimensional x -ray breast imaging (mog and medium oblique debates for each breast) for each involved woman. The X -ray breast imaging is initially read independently by two radiologists (sometimes, the third radiologist is honored). If at least one radiologist is considered a suspicious condition, the consensus conference will be held. At least the participants in the consensus conference are the first readers and the head of the head, but more radiologists can participate in the examination site. If the suspicious result is continuing at the consensus conference, the woman has been called to make more diagnostic assessments, which can include, among other things, ultrasound, nervous synthesis, or a look at enlargement or improved radiography or resonance imaging Magnetic.

Figure 1: Study profile.

The streamlined plan shows the inclusion of the study participants and their mission in groups.

For the study, the tests were set in the artificial intelligence group when at least a radiologist reads and the report is presented with artificial intelligence -backed scenes. All tests that none of the radiologist provides the report using an artificial intelligence backed budget. The task of the study group was not unknown to women and radiography as it was not yet appointed at the time of obtaining pictures. After acquiring the images, artificial intelligence predictions were calculated for all women, but they were only displayed to radiologists using scenes supported by artificial intelligence. The radiologists who read the first and second reading were free to use the current reports and scenes program without supporting artificial intelligence or scenes supported by artificial intelligence. The decision to use the scenes supported by artificial intelligence has been taken on the basis of interrogation (that is, the radiologist usually performs tests for both AI and control groups). Radiologists have chosen an independently appointment reader whether the scenes supported by artificial intelligence will be used. The results of the artificial intelligence of the other radiologist are not also unveiled if they also do not choose to use an artificial intelligence model.

The artificial intelligence system used was the VARA MG (from the German company VARA), a CE medical device designed to display X -ray breast imaging (the bodly program) and pre -examination to assist radiologists in the reporting routine. The performance of previous versions of the artificial intelligence program was reported^12,18. When using artificial intelligence -backed scenes, radiologists were supported by two features based on artificial intelligence (Figure 1): Figure 1):

1.

Regular dependency. The program chooses a sub -collection of all the tests it considers the artificial intelligence model. These “natural” exams are marked in the work list.
2.

safety net. The program chooses a sub -collection of all the tests it considers very artificial intelligence model. The radiologists first read the examination examination without any other additional support. When the radiologists explain the examination as unpleasant, the safety net is activated with alert and a proposal localization of the region (regions) suspicious in the images. The radiologists are then required to review their decision and accept or reject the safety network proposal.

Characteristics of the study residents

In general, 461,818 women participated in the study attending the breast examination in the 12 sites in the study. A total of 119 radiologists who make up 547 groups of the reader explained the exams. The mammograms were used from five different sellers (2). Of all the participating women, 260,739 was examined in the AI group (with the use of scenes supported by artificial intelligence by only one reader for 152,970 women and both readers for 107,769 women) and 201,079 in the control group. Table 1 displays the characteristics of the woman who has been examined and discovered breast cancer, according to the study group. Among the examined women, I had 41.9 per 1000 suspicious results and was called for more evaluation. A quarter of them (10.4 per 1000) underwent biopsy procedures, and 6.2 per 1000 breast cancer was diagnosed. Most (79.4 %) of the cancers were ranked as invasive, and 18.9 % of the site of the site (DCIS).

Table 1 The characteristics of the study residents in general and through the study group

Amnesty International Organization from regular intelligence, hitting the safety network and admission rates

The artificial intelligence mark is 56.7 % (262,055 from 461,818) of exams as usual. This percentage was higher in the artificial intelligence group (59.4 %) than it was in the control group (53.3 %, Table 2) due to the noticeable reading behavior. In the AI group (N= 260,739), the safety mesh has been operated for 3,959 (1.5 %), as shown in 2,233 exams (0.9 %) and acceptable in 1,077 (0.4 %) exams, which leads to 541 (0.2 %) Call and 204 (0.08 %) Breast Cancer Diagnosis. On the contrary, 8,032 (3.1 %) underwent a group of artificial intelligence to further evaluation by the consensus group although it is used by artificial intelligence, which caused the diagnosis of cancer.

Table 2 The Forensers and contributions of Amnesty International to the decisions of radiologists

Call, the rate of detection of cancer and positive predictive values

We controlled the specified confusion (the reader group and predicting male intelligence; the causal graph provided in expanded data 2) through weighting overlap based on tuning degrees (expanded data 3). The BCDR -based breast cancer detection rate for every 1,000 women was 6.70 for the AI group and 5.70 for the control group. This is an absolute difference on the model for one additional cancer per 1000 women examined and a relative increase of 17.6 % (95 % confidence (CI): +5.7 %, +30.8 %). BCDR has been considered in the Noninforior intelligence group and even statistically superior to the control group. The artificial intelligence group had a standard recall rate (37.4 per 1000) of the control group (38.3 per 1000), which indicates a reduction −2.5 % (.5.5 %, +1.7 %) (Table 3). The positive predictive value (PPV) for the recall was 17.9 % in the artificial intelligence group and 14.9 % in the control group. The biopsy in the artificial intelligence group was 8.2 % higher (. 40.4 %, +17.6 %) for the control group. Nevertheless, the artificial intelligence group PPV showed statistical standard than the biopsy (+9.0 % (+2.0 %,+16.4 %)).

Table 3 BCDRS, which was activated from models, recall rates, biopsy rates, consensus rates and corresponding differences in AI and control groups

Sub -group analyzes

Sub -group analyzes showed that BCDR increased in all sub -groups by circular examination, breast density and age, ranging between +12 % and +23 % (table 4). CIS 95 % was completely positive for sub -groups of the follow -up to the follow -up and the non -advanced breast, at the age of 60 to 69 years.

Table 4 BCDRS Forms, Call rates and corresponding differences in AI and control groups by examination, breast density and age group

Relative differences in retrieval rates varied in sub -groups between 5 % (between the ages of 50 and 59 years) and +4 % (60 to 69 years), but all CIS with the exception of women between the ages of 50 and 59.

Allergy Analysis

We performed various allergic analyzes, all of which showed that our analyzes were strong for various analytical decisions.

In a model, in addition to AI predictions and the reader group, they were seized on age, examination round, breast density and supervision in the tuning degree model, the BCDR remains unchanged at 17.6 % (5.7 %, 30.8 %). Likewise, in the modified model in addition, the PPV was of the summons and the biopsy 18.3 % (. 3). Useful after additional adjustments.

It also provided allergic analyzes that we modified for each reader individually instead of the reader group similar data to the main results: in the artificial intelligence group, BCDR was 19.0 % (7.4 %, 31.8 %) and the recall rate was 51.5 % (−5.4 %, 2.6 % Less, which indicates that the results were strong for the different teacher of the reader’s group variable.

The results were strong towards an error in the sampling, as it remained almost unchanged when the study sample was varied (Bootstraping and 80 % Choose a random sub -group, each 1000 times): BCDR was 17.6 % (5.7 %, 30.8 %) to pave BootstrapP and Bootstrapp 17.4 % (11.4 %,, 23.8 %) to choose the sub -group.

The alternative based on the degree of inclination to connect the overlap is a weighting degree of reverse inclination with pruning. After applying the various trimming thresholds (extended spreadsheet 4), the results remained similar.

Another alternative to connecting the degree of mile as a method of adjusting the founder is the class division. Again, the results of allergic analyzes including all the founders’ layers containing the minimum sample size (between 0 and 200) in each group are in line with the main results.

We conducted an imaginary intervention analysis to verify whether the effect of the artificial intelligence that was observed in the main analysis will disappear (as it should) when there is only fake intervention while all the assumptions of the model (i.e., in the presence of the remaining confusing due to reading behavior). As expected, the average difference on the minimum model (0.8 % (.9.9 %, 11.6 %)), indicating that there is no remaining confusing.

Reading times and reduce the work burden

The average reading time for each examination was measured in the artificial intelligence group only because it is not technically possible to measure this in the control group. On average, tags were read as normal more quickly (medium reading time, 16 seconds) of non -classified examinations (medium reading time, 30 seconds) and check network (medium reading time, 99 seconds ). In general, the radiologists have spent 43 % less time to explain the tagged tests as usual, with average reading time of 39 seconds for regular exams compared to 67 seconds for unusual examination (extended data 4).

To assess the potential of artificial intelligence to reduce the burden of reading work through automation, we analyzed a fake scenario in which he did not read the usual radiology tests by artificial intelligence. Instead, after predicting AI for “Normal”, the examination was directly received by the final “natural” classification, and therefore, no signs of breast cancer will be discovered by artificial intelligence, where a call or recall has been discovered. The analysis of this scenario showed that when all regular tests (56.7 %) were automatically classified as usual, BCDR was still 16.7 % higher and statistically superior (4.9 %, 29.9 %), and the average consensus was less by —. 19.4 % (.

Table 5 BCDRS Forms, Call rates and corresponding differences in AI and control groups for fictitious artificial intelligence automation