Leveraging Artificial Intelligence to Enhance Peer Review: Missed Liver Lesions on Computed Tomographic Pulmonary Angiography
Purpose
The aim of this study was to use artificial intelligence (AI) to facilitate peer review for detection of missed suspicious liver lesions (SLLs) on CT pulmonary angiographic (CTPA) examinations.
Methods
This retrospective study included 1 month of consecutive CTPA examinations from a multisite teleradiology practice. Visual classification (VC) software analyzed images for the presence (+) or absence (−) of SLLs (>1 cm, >20 Hounsfield units). Separately, a natural language processing (NLP) algorithm evaluated corresponding reports for description (+) of an SLL or lack thereof (−). Studies containing possible missed SLLs (VC+/NLP−) were reviewed by three abdominal radiologists in a two-step adjudication process to confirm if an SLL was missed by the interpreting radiologist. The number of VC+/NLP− cases, the number of images needing radiologist review, and the number of cases with confirmed missed SLLs were recorded. Interobserver agreement for SLLs was calculated for the radiologist readers.
Results
A total of 2,573 CTPA examinations were assessed, and 136 were classified as potentially containing missed SLLs (VC+/NLP−). After radiologist review, 13 cases with missed SLLs were confirmed, representing 0.5% of analyzed CT studies. Using AI, the ratio of CT studies requiring review to missed SLLs identified was 10:1; the ratio without the help of AI would be at least 66:1. Among the 136 cases reviewed by radiologists, interobserver agreement for SLLs was excellent (κ = 0.91).
Conclusions
AI can accelerate meaningful peer review by rapidly assessing thousands of examinations to identify potentially clinically significant errors. Although radiologist involvement is necessary, the amount of effort required after initial AI screening is dramatically reduced.
ACR Appropriateness Criteria® Imaging After Shoulder Arthroplasty: 2021 Update
Shoulder arthroplasty is a common orthopedic procedure with a complication rate reported to be as high as 39.8% and revision rates as high as 11%. Symptoms related to postoperative difficulties include activity-related pain, decreased range of motion, and apprehension. Some patients report immediate and persistent dissatisfaction, although others report a symptom-free postoperative period followed by increasing pain and decreasing shoulder function and mobility. Imaging plays an important role in diagnosing postoperative complications of shoulder arthroplasties. The imaging algorithm should always begin with radiographs. The selection of the next imaging modality depends on several factors, including findings on the initial imaging study, clinical suspicion of an osseous versus soft-tissue injury, and clinical suspicion of infection. The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision include an extensive analysis of current medical literature from peer reviewed journals and the application of well-established methodologies (RAND/UCLA Appropriateness Method and Grading of Recommendations Assessment, Development, and Evaluation or GRADE) to rate the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where evidence is lacking or equivocal, expert opinion may supplement the available evidence to recommend imaging or treatment.
Multi-institutional evaluation of a deep learning model for fully automated detection of aortic aneurysms in contrast and non-contrast CT
We developed and validated a research-only deep learning (DL) based automatic algorithm to detect thoracic and abdominal aortic aneurysms on contrast and non-contrast CT images and compared its performance with assessments obtained from retrospective radiology reports. The DL algorithm was developed using 556 CT scans. Manual annotations of aorta centerlines and cross-sectional aorta boundaries were created to train the algorithm. Aorta segmentation and aneurysm detection performances were evaluated on 2263 retrospective CT scans (154 thoracic and 176 abdominal aneurysms). Evaluation was performed by comparing the automatically detected aneurysm status to the aneurysm status reported in the radiology reports and the AUC was reported. In addition, a quantitative evaluation was performed to compare the automatically measured aortic diameters to manual diameters on a subset of 59 CT scans. Pearson correlation coefficient was used. For aneurysm detection, the AUC was 0.95 for thoracic aneurysm detection (95% confidence region [0.93, 0.97]) and 0.94 for abdominal aneurysm detection (95% confidence region [0.92, 0.96]). For aortic diameter measurement, the Pearson correlation coefficient was 0.973 (p<0.001).
Interpretations of Examinations Outside of Radiologists’ Fellowship Training: Assessment of Discrepancy Rates Among 5.9 Million Examinations from a National Teleradiology Databank
Background
In community settings, radiologists commonly function as multispecialty radiologists, interpreting examinations outside of their fellowship training.
Objective
To compare discrepancy rates for preliminary interpretations of acute community-setting examinations concordant versus discordant with interpreting radiologists’ fellowship training.
Methods
This retrospective study used the databank of a U.S. teleradiology company that provides preliminary interpretations for client community hospitals. The analysis included 5,883,980 acute examinations performed from 2012 to 2016 that were preliminarily interpreted by 269 teleradiologists with a fellowship of neuroradiology, abdominal radiology, or musculoskeletal radiology. When providing final interpretations, client on-site radiologists voluntarily submitted quality assurance (QA) requests if preliminary and final interpretations were discrepant; the teleradiology company’s QA committee categorized discrepancies as major (n=8,444) or minor (n=17,208). Associations among examination type (common vs advanced), relationship between examination subspecialty and the teleradiologist’s fellowship (concordant vs discordant), and major and minor discrepancies were assessed using three-way conditional analyses with generalized estimating equations.
Results
For examinations with concordant subspecialty, major discrepancy rate was lower for common than advanced examinations [0.13% vs 0.26%; relative risk (RR) 0.50, 95% CI: 0.42, 0.60; p < .001]. For examinations with discordant subspecialty, major discrepancy rate was lower for common than advanced examinations (0.14% vs 0.18%; RR 0.81, 95% CI: 0.72, 0.90; p < .001). For common examinations, major discrepancy rate was not different between examinations with concordant versus discordant subspecialty (0.13% vs 0.14%; RR 0.90, 95% CI: 0.81, 1.01; p = .07). For advanced examinations, major discrepancy rate was higher for examinations with concordant versus discordant subspecialty (0.26% vs 0.18%; RR 1.45, 95% CI: 1.18, 1.79; p < .001). Minor discrepancy rate was higher among advanced examinations for those with concordant versus discordant subspecialty (0.34% vs 0.29%; RR 1.17, 95% CI: 1.001, 1.36; p = .04), but not different for other comparisons (p > .05).
Conclusion
Major and minor discrepancy rates were not higher for acute community-setting examinations outside of interpreting radiologists’ fellowship training. Discrepancy rates increased for advanced examinations.
Radiologist errors by modality, anatomic region, and pathology for 1.6 million exams: what we have learned
Purpose
To evaluate the feasibility of adding pathology to recent radiologist error characterization schemes of modality and anatomic region and the potential of this data to more specifically inform peer review and peer learning.
Methods
Quality assurance data originating from 349 radiologists in a national teleradiology practice were collected for 2019. Interpretive errors were simply categorized as major or minor. Reporting or communication errors were classified as administrative errors. Interpretive errors were then divided by modality, anatomic region and placed into one of 64 pathologic categories.
Results
Out of 1,628,464 studies, the discrepancy rate was 0.5% (8181/1,634,201). The 8181 total errors consisted of 2992 major errors (0.18%) and 5189 minor errors (0.32%). Precisely, 3.1% (257/8181) of total errors were administrative. Of major interpretive errors, 75.5% occurred on CT, with CT abdomen and pelvis accounting for 40.4%. The most common pathologic discrepancy for all exams was in the category of mass, nodule, or adenopathy (1583/8181), the majority of which were minor (1315/1583). The most common pathologic discrepancy for the 2937 major interpretive errors was fracture or dislocation (27%; 793/2937), followed by bleed (10.7%; 315/2937).
Conclusion
The addition of error-related pathology to peer review is both feasible and practical and provides a more detailed guide to targeted individual and practice-wide peer learning quality improvement efforts. Future research is needed to determine if there are measurable improvements in detection or interpretation of specific pathologies following error feedback and educational interventions.
Measurement of Endotracheal Tube Positioning on Chest X-Ray Using Object Detection
Patients who are intubated with endotracheal tubes often receive chest x-ray (CXR) imaging to determine whether the tube is correctly positioned. When these CXRs are interpreted by a radiologist, they evaluate whether the tube needs to be repositioned and typically provide a measurement in centimeters between the endotracheal tube tip and carina. In this project, a large dataset of endotracheal tube and carina bounding boxes was annotated on CXRs, and a machine-learning model was trained to generate these boxes on new CXRs and to calculate a distance measurement between the tube and carina. This model was applied to a gold standard annotated dataset, as well as to all prospective data passing through our radiology system for two weeks. Inter-radiologist variability was also measured on a test dataset. The distance measurements for both the gold standard dataset (mean error?=?0.70 cm) and prospective dataset (mean error?=?0.68 cm) were noninferior to inter-radiologist variability (mean error?=?0.70 cm) within an equivalence bound of 0.1 cm. This suggests that this model performs at an accuracy similar to human measurements, and these distance calculations can be used for clinical report auto-population and/or worklist prioritization of severely malpositioned tubes.
Effect of Independent Resident Night Call Versus 24-7 Attending Radiologist Coverage on Subsequent Practice Performance
The traditional model of residency education in radiology, which has prevailed since the first residency programs in radiology were established in the 1930s, has included independent or autonomous after-hours coverage, in which radiology residents provide after-hours interpretations independently. Under this scenario, referring clinicians base their care decisions on the radiology residents’ preliminary interpretations, which are accepted as being subject to revision following attending radiologist review. Radiology residents receive delayed supervision by attending radiologists, most commonly the following morning, at which time any errors uncovered are remediated.
This standard practice has been frequently evaluated and is generally accepted as being extremely safe for patients, with a very low rate of significant resident errors and creating no appreciable increased risk of patient harm resulting from the residents’ role.
Detection of Superior Mesenteric Artery Occlusion on Abdominal CT Using a Machine Learning Model
The superior mesenteric artery (SMA) branches from the aorta and is a major arterial supplier of the intestines. Occlusion of the SMA often results in bowel ischemia which can lead to severe morbidity or death. SMA occlusion can be missed on abdominal CT in clinical practice. Our radiology practice processes a high volume of abdominal CT studies and developing a screening tool for this pathology could improve patient care. Hypothesis We hypothesized that by training a bounding box-based machine learning model on a labeled dataset of both occluded and non-occluded SMA studies, this model could be used to prospectively screen patients for SMA occlusion (SMAO).The results could then be used in a quality assurance (QA) pipeline. We also hypothesized this model could be implemented with high specificity to avoid a large number of false positives in a high-throughput setting. Methods A natural language processing (NLP) model was used to select 142 retrospective post-contrast abdominal CT studies from our database that were positive for SMAO. The occlusions on each slice in these series were segmented by a radiologist and the segmentations were converted into bounding box data. A total of 1,286 images with SMAO were used. Additionally, 1,286 post-contrast abdominal CT images negative for SMAO were added to the training dataset. The model was trained using the yolo-v3 bounding box framework with a single output label for SMAO. A training/validation split of 90/10 was used, and the trained model was run on a test set containing SMAO and non-SMAO studies. The model was incorporated into our prospective data pipeline and run on incoming data, comparing the model result to the NLP result for a two-week period. Results The best validation mean average precision (mAP) achieved during training was 0.616. The test set AUC was 0.936. For prospective data, 21,326 post-contrast abdominal CT studies passed through our system over a two-week period with a sensitivity and specificity of 50.0% and 99.4%, respectively. As a relatively rare condition, only 8 SMAO cases came through during this time, of which 4 of them were identified. Many of the false positives contained atherosclerosis or partial occlusions, which were considered negative for the purposes of this study but may also be important clinically. Conclusion A machine learning model was trained that identifies SMAO in the clinical setting with high specificity and reasonable sensitivity. This model can be incorporated into a clinical workflow to avoid missed diagnoses of this critical condition.
Natural Language Processing and Machine Learning for Detection of Respiratory Illness by Chest CT Imaging and Tracking of COVID-19 Pandemic in the US
Background
Coronavirus disease 2019 (COVID-19) has spread quickly throughout the United States (US) causing significant disruption in healthcare and society. Tools to identify hot spots are important for public health planning. The goal of our study was to determine if natural language processing (NLP) algorithm assessment of thoracic computed tomography (CT) imaging reports correlated with the incidence of official COVID-19 cases in the US.
Methods
Using de-identified HIPAA compliant patient data from our common imaging platform interconnected with over 2,100 facilities covering all 50 states, we developed three NLP algorithms to track positive CT imaging features of respiratory illness typical in SARS-CoV-2 viral infection. We compared our findings against the number of official COVID-19 daily, weekly and state-wide.
Results
The NLP algorithms were applied to 450,114 patient chest CT comprehensive reports gathered from January 1st to October 3rd, 2020. The best performing NLP model exhibited strong correlation with daily official COVID-19 cases (r2=0.82, p<0.005). The NLP models demonstrated an early rise in cases followed by the increase of official cases, suggesting the possibility of an early predictive marker, with strong correlation to official cases on a weekly basis (r2=0.91, p<0.005). There was also substantial correlation between the NLP and official COVID-19 incidence by state (r2=0.92, p<0.005).
Conclusion
Using big data, we developed a novel machine-learning based NLP algorithm that can track imaging findings of respiratory illness detected on chest CT imaging reports with strong correlation with the progression of the COVID-19 pandemic in the US.
Automated Segmentation and Worklist Prioritization of Pneumoperitoneum in Abdominal CT Images Using a Convolutional Neural Network
Purpose
Pneumoperitoneum, the presence of free gas in the peritoneal cavity, can be a sign of critical pathology such as bowel perforation or trauma. Pneumoperitoneum is often diagnosed with abdominal CT and early detection is important to a patient’s outcome. Our institution processes approximately 3,300 abdominal CT studies per day, of which 1.3% are positive for pneumoperitoneum. We hypothesized that a convolutional neural network could be trained to detect pneumoperitoneum in prospective patients in order to expedite patient care.
Method and materials
Natural language processing (NLP) of radiology CT reports was used retrospectively to identify 297 body CT studies containing pneumoperitoneum. Axial CT images of these studies were annotated by a Board Certified radiologist to train a convolutional neural network. The training dataset consisted of 2,986 positive images and their segmentations, along with an equal number of negative images. A uNet model was trained using ResNet32 as the backbone. The model was first applied to a test cohort of 100 patients. This model was then integrated with our teleradiology pipeline to screen prospective patients for pneumoperitoneum in real-time, with NLP of the subsequent radiology report used as ground truth.
Results
The model achieved an AUC of 0.906 on the test dataset. A detection threshold of 3 cc pneumoperitoneum was selected. Over a two-week period, for prospective patients, the model had a sensitivity of 50.1% and a specificity of 94.7%. The mean volume of pneumoperitoneum was 37.4 cc for true positives with a maximum of 413.5 cc. Conclusion An artificial intelligence model was trained to quantify pneumoperitoneum on CT images and implemented in a real-time clinical system. To our knowledge, this is the first use of machine learning to identify pneumoperitoneum on CT images and perform worklist prioritization for patients based on its presence. This model is currently being expanded to identify additional types of free air such as pneumothorax, pneumomediastinum, and soft tissue gas.
ACR Appropriateness Criteria® Chronic Foot Pain
Chronic foot pain is a frequent clinical complaint, which can significantly impact the quality of live in some individuals. These guidelines define best practices with regards to requisition of imaging studies based on specific clinical scenarios, which have been grouped into different variants. Each variant is accompanied by a brief description of the usefulness, advantages, and limitations of different imaging modalities. The present narrative is the result of an exhaustive assessment of the available literature and a thorough review process by a panel of experts on Musculoskeletal Imaging.
The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision include an extensive analysis of current medical literature from peer reviewed journals and the application of well-established methodologies (RAND/UCLA Appropriateness Method and Grading of Recommendations Assessment, Development, and Evaluation or GRADE) to rate the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where evidence is lacking or equivocal, expert opinion may supplement the available evidence to recommend imaging or treatment.
Classification of Endotracheal Tube Positioning on Chest XR using a Convolutional Neural Net Trained with Annotated Images
Endotracheal tube intubation is often used when patients are ill and require respiratory assistance. These tubes must be positioned properly in relation to the carina; too high and the lungs may not be respirated, too low and only one lung may be respirated. Our institution receives approximately 4,000 XR Chest images every day, 5% of which contain an endotracheal tube. If the tube is determined to be malpositioned by the reading radiologist, this information is relayed back to the site for tube adjustment. We hypothesized that by training a convolution neural net using annotations of Chest XR images, we could localize both the endotracheal tube and the carina on prospective Chest XR data and use this information to classify images as having a malpositioned tube or not, along with the distance in cm that the tube must be adjusted if malpositioned.
Radiologist Opinions of a Quality Assurance Program: The Interaction Between Error, Emotion, and Preventative Action
Rationale and Objectives
To investigate inter-relationships between radiologist opinions of a quality assurance (QA) program, QA Committee communications, negative emotions, self-identified risk factors, and preventive actions taken following major errors.
Materials and Methods
A 48 question electronic survey was distributed to all 431 radiologists within the same teleradiology organization between June 15 and July 3, 2018. Two reminders were sent during the survey time period. Descriptive statistics were generated, and comparisons were made with Fisher exact test. Significance level was set at p < 0.05.
Results
Response rate was 67.5% (291/431), and 72.5% of respondents completed all survey questions. A total of 64.3% of respondents were male, and the highest proportion of radiologists (28.9%, 187/291) had been in practice >20 years. Preventative actions following an error were positively correlated to a higher opinion of the QA process, self-identification of personal risk factors for error, and greater negative emotions following an error (all p < 0.05). A higher opinion of communications with the QA committee was associated with a positive opinion of the QA process (p < 0.001). An inverse relationship existed between negative emotion and opinion of QA committee communications (p < 0.05) and negative emotion and opinion of the QA process (p < 0.05). Radiologist gender and full time versus part time status had a significant effect on perception of the QA process (p < 0.05).
Conclusion
Radiologist opinions of their institutional QA process was related to the number of negative emotions experienced and preventative actions taken following major errors. Nurturing trust and incorporating more positive feedback in the QA process may improve interactions with QA Committees and mitigate future errors.
Classification of Aortic Dissection and Rupture on Post-contrast CT Images Using a Convolutional Neural Network
Aortic dissections and ruptures are life-threatening injuries that must be immediately treated. Our national radiology practice receives dozens of these cases each month, but no automated process is currently available to check for critical pathologies before the images are opened by a radiologist. In this project, we developed a convolutional neural network model trained on aortic dissection and rupture data to assess the likelihood of these pathologies being present in prospective patients. This aortic injury model was used for study prioritization over the course of 4 weeks and model results were compared with clinicians’ reports to determine accuracy metrics. The model obtained a sensitivity and specificity of 87.8% and 96.0% for aortic dissection and 100% and 96.0% for aortic rupture. We observed a median reduction of 395 s in the time between study intake and radiologist review for studies that were prioritized by this model. False-positive and false-negative data were also collected for retraining to provide further improvements in subsequent versions of the model. The methodology described here can be applied to a number of modalities and pathologies moving forward.
Effect of intravenous contrast for CT abdomen and pelvis on detection of urgent and non-urgent pathology: can repeat CT within 72 hours be avoided?
Purpose
To determine if administering IV contrast for CT abdomen and pelvis improves detection of urgent and clinically important non-urgent pathology in patients with urgent clinical symptoms compared to patients not receiving IV contrast, and in turn to determine whether repeat CT exams on the same patient within 72 h were of low diagnostic benefit if the first CT was performed with IV contrast.
Methods
We evaluated 400 consecutive patients who had CT abdomen and pelvis (CT AP) examinations repeated within 72 h. For each patient, demographic data, reason for examination, examination time stamps, and examination technique were documented. CT AP radiology reports were reviewed and both urgent and non-urgent pathology was extracted.
Results
Of 400 patients, 63% had their initial CT AP without contrast. Administration of IV contrast for the first CT AP was associated with increased detection of urgent findings compared with non-contrast CT (p=?0.004) and a contrast-enhanced CT AP following an initial non-contrast CT AP examination better characterized both urgent (p=?0.002) and non-urgent findings (p<0.001). Adherence to ACR appropriateness criteria for IV contrast administration was associated with increased detection of urgent pathology on the first CT (p=?0.02), and the second CT was more likely to be performed with IV contrast if recommended by the radiologist reading the first CT (p=?0.0006).
Conclusion
In the absence of contraindications, encouraging urgent care physicians to preferentially order IV contrast-enhanced CT AP examinations in adherence with ACR appropriateness criteria may increase detection of urgent pathology and avoid short-term repeat CT AP.
ACR Appropriateness Criteria (®) Shoulder Pain-Atraumatic
Shoulder pain is one of the most common reasons for musculoskeletal-related physician visits. Imaging plays an important role in identifying the specific cause of atraumatic shoulder pain. This review is divided into two parts. The first part provides a general discussion of various imaging modalities (radiographs, arthrography, nuclear medicine, ultrasound, CT, and MRI) and their usefulness in evaluating atraumatic shoulder pain. The second part focuses on the most appropriate imaging algorithms for specific shoulder conditions including: rotator cuff disorders, labral tear/instability, bursitis, adhesive capsulitis, biceps tendon abnormalities, postoperative rotator cuff tears, and neurogenic pain.
The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision include an extensive analysis of current medical literature from peer reviewed journals and the application of well-established methodologies (RAND/UCLA Appropriateness Method and Grading of Recommendations Assessment, Development, and Evaluation or GRADE) to rate the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where evidence is lacking or equivocal, expert opinion may supplement the available evidence to recommend imaging or treatment.
Expert Panel on Musculoskeletal Imaging. ACR Appropriateness Criteria(®) Chronic Wrist Pain
Radiographs are indicated as the first imaging test in all patients with chronic wrist pain, regardless of the suspected diagnosis. When radiographs are normal or equivocal, advanced imaging with MRI (without or without intravenous contrast or following arthrography), CT (usually without contrast), and ultrasound each has a role in establishing a diagnosis. Furthermore, these examinations may contribute to staging disease, treatment planning, and prognostication, even when radiographs are diagnostic of a specific condition. Which examination or examinations are best depends on the specific location of pain and the clinically suspected conditions.
The American College of Radiology Appropriateness Criteria are evidence-based guidelines for specific clinical conditions that are reviewed annually by a multidisciplinary expert panel. The guideline development and revision include an extensive analysis of current medical literature from peer reviewed journals and the application of well-established methodologies (RAND/UCLA Appropriateness Method and Grading of Recommendations Assessment, Development, and Evaluation or GRADE) to rate the appropriateness of imaging and treatment procedures for specific clinical scenarios. In those instances where evidence is lacking or equivocal, expert opinion may supplement the available evidence to recommend imaging or treatment.
The Benefit of a Triage System to Expedite Acute Stroke Head Computed Tomography Interpretations
Background and purpose
We developed and tested a triage system to accelerate the interpretation of stroke head computed tomographies (CTs), with the goal of optimizing the time available for acute stroke therapy.
Materials and methods
In our practice, acute stroke protocol head CTs have been given the highest reading priority. We implemented a technologically enabled prioritization infrastructure to consistently present these critical cases to our radiologists so they are evaluated before other examinations. In our 1-year retrospective multicenter study of 350,495 head CT examinations, we compared the reading time of stroke protocol head CTs to our next highest priority head CT.
Results
Our average acute stroke head CT reading turnaround time was 6.5 minutes. This represented a 17.3-minute improvement over the next highest priority head CT in our practice (confidence interval: 17.2-17.4 minutes, P < .001).
Conclusions
A technologically enabled acute stroke protocol CT triage system consistently improves the reading times of critically time-dependent head CT examinations. As a result, this system has the potential to improve treatment times, treatment eligibility, and clinical outcomes.
Effect of Shift, Schedule, and Volume on Interpretive Accuracy: A Retrospective Analysis of 2.9 Million Radiologic Examinations
Purpose
To determine whether there is an association between radiologist shift length, schedule, or examination volume and interpretive accuracy.
Materials and Methods
This study was institutional review board approved and HIPAA compliant. A retrospective analysis of all major discrepancies from a 2015 quality assurance database of a teleradiology practice was performed. Board-certified radiologists provided initial preliminary interpretations. Discrepancies were identified during a secondary review by a practicing radiologist or through an internal quality assurance process and were vetted through a consensus radiology quality assurance committee. Unique anonymous radiologist identifiers were used to link the discrepancies to radiologists’ shifts and schedules. Data were analyzed by using analysis of variance, t test, or ?2 test.
Results
A total of 4294 major discrepancies resulted from 2 922 377 examinations (0.15%). There was a significant difference for shift length (P < .0001) and volume (P < .0001) for shifts with versus those without discrepancies. On average, errors occurred a mean (± standard deviation) of 8.97 hours ± 2.28 into the shift (median, 10 hours; interquartile range, 2.0 hours). Significantly more errors occurred late in shifts than early (P < .0001), peaking between 10 and 12 hours. The number of major discrepancies in a single shift ranged from one to four, with a significant difference in the number of discrepancies as a function of study volume (volume for all shifts, 67.60 ± 60.24; volume for shifts with major discrepancies, 118.96 ± 66.89; P < .001). Despite a trend for more discrepancies after more consecutive days worked, the difference was not significant (P = .0893).
Conclusion
Longer shifts and higher diagnostic examination volumes are associated with increased major interpretive discrepancies. These are more likely to occur later in a shift, peaking after the 10th hour of work.
Emergency Radiology Practice Patterns: Shifts, Schedules, and Job Satisfaction
Purpose
To assess the practice environment of emergency radiologists with a focus on schedule, job satisfaction, and self-perception of health, wellness, and diagnostic accuracy.
Methods
A survey drawing from prior radiology and health care shift-work literature was distributed via e-mail to national societies, teleradiology groups, and private practices. The survey remained open for 4 weeks in 2016, with one reminder. Data were analyzed using hypothesis testing and logistic regression modeling.
Results
Response rate was 29.6% (327/1106); 69.1% of respondents (n = 226) were greater than 40 years old, 73% (n = 240) were male, and 87% (n = 284) practiced full time. With regard to annual overnight shifts (NS): 36% (n = 118) did none, 24.9% (n = 81) did 182 or more, and 15.6% (n = 51) did 119. There was a significant association between average NS worked per year and both perceived negative health effects (P < .01) and negative impact on memory (P < .01). There was an inverse association between overall job enjoyment and number of annual NS (P < .05). The odds of agreeing to the statement “I enjoy my job” for radiologists who work no NS is 2.21 times greater than for radiologists who work at least 119 NS, when shift length is held constant. Radiologists with 11+ years of experience who work no NS or 1 to 100 NS annually have lower odds of feeling overwhelmed when compared with those working the same number of NS with <10 years’ experience.
Conclusion
There is significant variation in emergency radiology practice patterns. Annual NS burden is associated with lower job satisfaction and negative health self-perception.