Accelerated evidence synthesis in orthopaedics—the roles of natural language processing, expert annotation and large language models

© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Journal of Experimental Orthopaedics

In an era of electronical medical records, rapidly expanding publication rates of medical knowledge, and largescale registries, orthopaedics is in a dire need of innovative approaches to facilitate the adoption of the latest knowledge in clinical practice.While machine learning (ML) has been heralded as one solution to many research tasks hampered by previous technological limitations [12], there is an increasing need to direct our attention towards subdomains of ML that are convenient for the extraction of meaningful clinical information stored in medical records.We believe natural language processing (NLP) to be one such domain of ML, with an immense future potential to catalyse rate-limiting steps in orthopaedic research.

Fundamental concepts
Natural language processing is a ML-based tool that involves quantitative encoding of information derived from human language.Data generated from speech-and text-processing NLP algorithms can be used to solve a variety of tasks with broad applications in medical practice and research.Due to limited examples of NLP-based research in orthopaedics [3,15], commonly used NLP tasks are best illustrated with examples of their potential applications across medical fields: • Text classification -Categorisation and clustering of scientific articles based on level of evidence and/or sub-topics, detected using abstract screening for relevant terms.• Information extraction -Identification of information related to patients, interventions, comparisons, and outcome variables (PICO elements) [2] from electronic medical records (EMR) and publications using, for example, named entity recognition (NER).• Question answering -Automated responses to frequently asked questions with a custom medical knowledge base used to generate conversational layers.• Sentiment analysis -Assessment of the emotions and opinions of patients about a medical service based on analysis of the affective qualities of written reviews [4].• Summarization -Abstraction of a large volume of medical evidence to generate a short summary with essential and easy to understand information for patients.
Understanding of the inner workings and performance of ML models are key steps in identifying applications for NLP in orthopaedic research [10].Accuracy (closeness), precision (exactness), recall (positive predictive value) and the F 1 score (a combination of precision and recall) are key metrics used in the evaluation and interpretation of NLP models.

Barriers to automated data extraction
While there is no shortage of available data for orthopaedic research, a major barrier to the accessibility of data is due to its storage as unstructured text.A previously published editorial outlined the discrepancy between the publication rate of primary research articles and the synthesis of up-to-date evidence in the form of systematic reviews and meta-analyses [18].Consequently, the concept of living evidence synthesis was proposed to tackle this problem, which largely relies on NLP for near real-time extraction and compilation of relevant medical data.Additionally, the widespread adoption of EMRs by healthcare systems across the globe provides a wealth of untapped medical knowledge in the form of deidentified patient data.Unfortunately, the lack of standardization and consistency in medical documentation poses difficulties for the automated extraction of relevant and accurate information.Early results show improved performance in clinical predictions when structured EMR data is complemented with NLP analysis of unstructured EMR text [13].While both supervised [9] and unsupervised [1] ML approaches are available for NLP, information extraction from medical text are likely to benefit from context-specific interpretation.Problematically, medical text is heterogeneous in structure and style, with a vast possibility of syntactic and semantic variability (such as abbreviations), which in turn leads to ambiguous interpretation by both humans and computers [7].The design of automated frameworks for reliable entity and patternrecognition in such complex environments is a critical challenge to overcome.Supervised ML methods using labelling instructions agreed upon by domain experts may reduce annotation errors, and lead to a higher quality of information extraction from context-specific text data [11].For example, a panel of experts in ACL surgery would have the possibility to develop labelling instructions and benchmarks for extracting data from medical records regarding postoperative outcomes after ACL reconstruction.The panel would need to reach a consensus on the essential components to label, such as graft tunnel placement, graft choice and thickness, presence or absence of anterolateral augmentation, among others.Labelling instructions would thereby help establish benchmarks for consistency and reproducibility in NLPdriven research, and maximize the quality of evidence synthesis across the international orthopaedic community.It is important to point out that the clinical utility of AI systems depends heavily on the magnitude and quality of training data, which leads to concern regarding the ethical and secure access to patient information.
Consequently, future efforts will also require carefully planned regulatory supervision to safeguard the national and international distribution of patient data extracted from medical records with NLP [5].

Condition-specific annotation and NLP frameworks
The use of standardized knowledge bases is essential for the design and implementation of NLP algorithms designed for specific research purposes.We believe the next step towards solving the challenges associated with information extraction is to establish comprehensive knowledge-base of annotated disease-or injury-specific medical text.This idea rests on the principle that an NLP model is more likely to perform well when trained on a body of domain-specific information, with expertlevel annotation and abstraction of the key element in the text, even if it has been pre-trained for general language understanding.A recent study of biomedical image analysis determined that improvements in labelling instructions have an immense impact on the interrater variability in the quality and consistency of annotations, and consequently, on the performance of the final algorithm [11].Similarly, clearly formulated instructions established by domain experts may mitigate some of the errors pervasive to labelling due to time pressure, variability in motivation, differences in knowledge or style, and interpretation of the text [7].Importantly, expert annotation of training data for a given area of orthopaedics should focus on creating a consistent and replicable framework for NLP application, which clearly distinguishes entities, relationships between different entities, and multiple attributes specific to individual entities [17].This approach could then be considered a standard operating procedure for reliable and accurate extraction of essential medical information from medical charts and primary research articles (Fig. 1).Consequently, we propose the creation of annotated collections of scientific text based on expert consensus, specific to musculoskeletal conditions affecting the spine, shoulder, hip, knee, and ankle joints, to expedite data extraction and the synthesis of up-to-date evidence using NLP tools.Due to the inherent complexity of the task, the annotation of medical knowledge will require the interdisciplinary cooperation of healthcare professionals, linguists, and computer scientists.

The potential of large language models
Over the recent year, large language models (LLMs), such as GPT-4 [8], Med-PaLM 2 [14], among others, showcased the revolutionary impact of medical question-answering with generative AI (GAI) on the healthcare sector.Expert-annotated, foundational datasets designed for NLP tasks may be integrated with LLMs to perform a variety of tasks, expediting both orthopaedic research, the appraisal of existing evidence and the delivery of orthopaedic care in the clinic.Annotation of important clinical concepts and their relations in EHRs, operative notes, radiology notes, and research studies based on semantic similarity may be used to train LLMs for performing clinically useful tasks with high efficiency and accuracy [16].Additionally, GAI may be applied in a broader sense, with the capability to interpret multimodal, domain-specific information, including labelled or unlabelled medical images, patient interviews and patient reported outcome data in the context of complex clinical scenarios [6].Harnessing the potential of LLMs and GAI may catalyse the development of clinical decision-support tools to optimize the quality of treatment for patients with orthopaedic conditions.Such endeavours require strict emphasis on the quality of data used for training foundational datasets, which necessitates expert consensus to lay out standards for the information used to design systems with advanced medical reasoning capabilities.

Conclusion
We believe the adoption of NLP frameworks to be one of the key steps in the evolution of medical data extraction and evidence-synthesis.There is currently a need for innovative solutions to obtain meaningful information from the growing availability of structured and unstructured medical text, with the goal to improve the quality of patient care.Considering the immense potential in the clinical and research setting, there is a growing need for the dedicated training of healthcare professionals in the fundamental concepts and applications of AI.The annotation of condition-specific training data and design of efficient NLP pipelines are complex tasks, which require close collaboration between the healthcare and technology sectors to establish high-quality and scalable systems despite existing disparities across the global healthcare sector.Rather than solely being the end-users of AI systems, healthcare professionals should take a more active role in the development of frameworks for specific aspects of orthopaedic research and clinical care.Finally, expert consensus is required to integrated labelled and unlabelled orthopaedic datasets to train LLMs and GAI models to perform domain-specific tasks, such as clinical concept extraction, medical relation extraction, and medical question answering, with high efficiency, accuracy and reliability.

Fig. 1
Fig. 1 Key steps in the collaborative collection, annotation, and extraction of medical data for living evidence synthesis and integration with LLMs