Accelerated evidence synthesis in orthopaedics—the roles of natural language processing, expert annotation and large language models

Zsidai, Bálint; Kaarre, Janina; Hilkert, Ann-Sophie; Narup, Eric; Senorski, Eric Hamrin; Grassi, Alberto; Ayeni, Olufemi R.; Musahl, Volker; Ley, Christophe; Herbst, Elmar; Hirschmann, Michael T.; Kopf, Sebastian; Seil, Romain; Tischer, Thomas; Samuelsson, Kristian; Feldt, Robert

doi:10.1186/s40634-023-00662-4

Editorial Note
Open access
Published: 28 September 2023

Accelerated evidence synthesis in orthopaedics—the roles of natural language processing, expert annotation and large language models

Bálint Zsidai ORCID: orcid.org/0000-0002-5697-6577^1,2,
Janina Kaarre^1,2,3,
Ann-Sophie Hilkert^4,5,
Eric Narup^1,2,
Eric Hamrin Senorski^1,6,7,
Alberto Grassi⁸,
Olufemi R. Ayeni⁹,
Volker Musahl^2,3,
Christophe Ley¹⁰,
Elmar Herbst¹¹,
Michael T. Hirschmann¹²,
Sebastian Kopf^13,14,
Romain Seil¹⁵,
Thomas Tischer¹⁶,
Kristian Samuelsson^1,2,17,
Robert Feldt⁴ &
ESSKA Artificial Intelligence Working Group

Journal of Experimental Orthopaedics volume 10, Article number: 99 (2023) Cite this article

1143 Accesses
1 Citations
Metrics details

In an era of electronical medical records, rapidly expanding publication rates of medical knowledge, and large-scale registries, orthopaedics is in a dire need of innovative approaches to facilitate the adoption of the latest knowledge in clinical practice. While machine learning (ML) has been heralded as one solution to many research tasks hampered by previous technological limitations [12], there is an increasing need to direct our attention towards subdomains of ML that are convenient for the extraction of meaningful clinical information stored in medical records. We believe natural language processing (NLP) to be one such domain of ML, with an immense future potential to catalyse rate-limiting steps in orthopaedic research.

Fundamental concepts

Natural language processing is a ML-based tool that involves quantitative encoding of information derived from human language. Data generated from speech- and text-processing NLP algorithms can be used to solve a variety of tasks with broad applications in medical practice and research. Due to limited examples of NLP-based research in orthopaedics [3, 15], commonly used NLP tasks are best illustrated with examples of their potential applications across medical fields:

Text classification – Categorisation and clustering of scientific articles based on level of evidence and/or sub-topics, detected using abstract screening for relevant terms.
Information extraction – Identification of information related to patients, interventions, comparisons, and outcome variables (PICO elements) [2] from electronic medical records (EMR) and publications using, for example, named entity recognition (NER).
Question answering – Automated responses to frequently asked questions with a custom medical knowledge base used to generate conversational layers.
Sentiment analysis – Assessment of the emotions and opinions of patients about a medical service based on analysis of the affective qualities of written reviews [4].
Summarization – Abstraction of a large volume of medical evidence to generate a short summary with essential and easy to understand information for patients.

Understanding of the inner workings and performance of ML models are key steps in identifying applications for NLP in orthopaedic research [10]. Accuracy (closeness), precision (exactness), recall (positive predictive value) and the F ₁ score (a combination of precision and recall) are key metrics used in the evaluation and interpretation of NLP models.

Barriers to automated data extraction

While there is no shortage of available data for orthopaedic research, a major barrier to the accessibility of data is due to its storage as unstructured text. A previously published editorial outlined the discrepancy between the publication rate of primary research articles and the synthesis of up-to-date evidence in the form of systematic reviews and meta-analyses [18]. Consequently, the concept of living evidence synthesis was proposed to tackle this problem, which largely relies on NLP for near real-time extraction and compilation of relevant medical data. Additionally, the widespread adoption of EMRs by healthcare systems across the globe provides a wealth of untapped medical knowledge in the form of deidentified patient data. Unfortunately, the lack of standardization and consistency in medical documentation poses difficulties for the automated extraction of relevant and accurate information. Early results show improved performance in clinical predictions when structured EMR data is complemented with NLP analysis of unstructured EMR text [13]. While both supervised [9] and unsupervised [1] ML approaches are available for NLP, information extraction from medical text are likely to benefit from context-specific interpretation. Problematically, medical text is heterogeneous in structure and style, with a vast possibility of syntactic and semantic variability (such as abbreviations), which in turn leads to ambiguous interpretation by both humans and computers [7]. The design of automated frameworks for reliable entity and pattern-recognition in such complex environments is a critical challenge to overcome. Supervised ML methods using labelling instructions agreed upon by domain experts may reduce annotation errors, and lead to a higher quality of information extraction from context-specific text data [11]. For example, a panel of experts in ACL surgery would have the possibility to develop labelling instructions and benchmarks for extracting data from medical records regarding postoperative outcomes after ACL reconstruction. The panel would need to reach a consensus on the essential components to label, such as graft tunnel placement, graft choice and thickness, presence or absence of anterolateral augmentation, among others. Labelling instructions would thereby help establish benchmarks for consistency and reproducibility in NLP-driven research, and maximize the quality of evidence synthesis across the international orthopaedic community. It is important to point out that the clinical utility of AI systems depends heavily on the magnitude and quality of training data, which leads to concern regarding the ethical and secure access to patient information. Consequently, future efforts will also require carefully planned regulatory supervision to safeguard the national and international distribution of patient data extracted from medical records with NLP [5].

Condition-specific annotation and NLP frameworks

The use of standardized knowledge bases is essential for the design and implementation of NLP algorithms designed for specific research purposes. We believe the next step towards solving the challenges associated with information extraction is to establish comprehensive knowledge-base of annotated disease- or injury-specific medical text. This idea rests on the principle that an NLP model is more likely to perform well when trained on a body of domain-specific information, with expert-level annotation and abstraction of the key element in the text, even if it has been pre-trained for general language understanding. A recent study of biomedical image analysis determined that improvements in labelling instructions have an immense impact on the interrater variability in the quality and consistency of annotations, and consequently, on the performance of the final algorithm [11]. Similarly, clearly formulated instructions established by domain experts may mitigate some of the errors pervasive to labelling due to time pressure, variability in motivation, differences in knowledge or style, and interpretation of the text [7]. Importantly, expert annotation of training data for a given area of orthopaedics should focus on creating a consistent and replicable framework for NLP application, which clearly distinguishes entities, relationships between different entities, and multiple attributes specific to individual entities [17]. This approach could then be considered a standard operating procedure for reliable and accurate extraction of essential medical information from medical charts and primary research articles (Fig. 1). Consequently, we propose the creation of annotated collections of scientific text based on expert consensus, specific to musculoskeletal conditions affecting the spine, shoulder, hip, knee, and ankle joints, to expedite data extraction and the synthesis of up-to-date evidence using NLP tools. Due to the inherent complexity of the task, the annotation of medical knowledge will require the interdisciplinary cooperation of healthcare professionals, linguists, and computer scientists.

The potential of large language models

Over the recent year, large language models (LLMs), such as GPT-4 [8], Med-PaLM 2 [14], among others, showcased the revolutionary impact of medical question-answering with generative AI (GAI) on the healthcare sector. Expert-annotated, foundational datasets designed for NLP tasks may be integrated with LLMs to perform a variety of tasks, expediting both orthopaedic research, the appraisal of existing evidence and the delivery of orthopaedic care in the clinic. Annotation of important clinical concepts and their relations in EHRs, operative notes, radiology notes, and research studies based on semantic similarity may be used to train LLMs for performing clinically useful tasks with high efficiency and accuracy [16]. Additionally, GAI may be applied in a broader sense, with the capability to interpret multimodal, domain-specific information, including labelled or unlabelled medical images, patient interviews and patient reported outcome data in the context of complex clinical scenarios [6]. Harnessing the potential of LLMs and GAI may catalyse the development of clinical decision-support tools to optimize the quality of treatment for patients with orthopaedic conditions. Such endeavours require strict emphasis on the quality of data used for training foundational datasets, which necessitates expert consensus to lay out standards for the information used to design systems with advanced medical reasoning capabilities.

Conclusion

We believe the adoption of NLP frameworks to be one of the key steps in the evolution of medical data extraction and evidence-synthesis. There is currently a need for innovative solutions to obtain meaningful information from the growing availability of structured and unstructured medical text, with the goal to improve the quality of patient care. Considering the immense potential in the clinical and research setting, there is a growing need for the dedicated training of healthcare professionals in the fundamental concepts and applications of AI. The annotation of condition-specific training data and design of efficient NLP pipelines are complex tasks, which require close collaboration between the healthcare and technology sectors to establish high-quality and scalable systems despite existing disparities across the global healthcare sector. Rather than solely being the end-users of AI systems, healthcare professionals should take a more active role in the development of frameworks for specific aspects of orthopaedic research and clinical care. Finally, expert consensus is required to integrated labelled and unlabelled orthopaedic datasets to train LLMs and GAI models to perform domain-specific tasks, such as clinical concept extraction, medical relation extraction, and medical question answering, with high efficiency, accuracy and reliability.

Availability of data and materials

Not applicable.

References

Eckhardt CM, Madjarova SJ, Williams RJ, Ollivier M, Karlsson J, Pareek A et al (2023) Unsupervised machine learning methods and emerging applications in healthcare. Knee Surg Sports Traumatol Arthrosc 31:376–381
Article PubMed Google Scholar
Jin D, Szolovits P (2020) Advancing PICO element detection in biomedical text via deep neural networks. Bioinformatics 36:3856–3862
Article CAS PubMed Google Scholar
Karhade AV, Bongers MER, Groot OQ, Kazarian ER, Cha TD, Fogel HA et al (2020) Natural language processing for automated detection of incidental durotomy. Spine J 20:695–700
Article PubMed Google Scholar
Langerhuizen DWG, Brown LE, Doornberg JN, Ring D, Kerkhoffs G, Janssen SJ (2021) Analysis of online reviews of orthopaedic surgeons and orthopaedic practices using natural language processing. J Am Acad Orthop Surg 29:337–344
Article PubMed Google Scholar
Mesko B, Topol EJ (2023) The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med 6:120
Article PubMed PubMed Central Google Scholar
Moor M, Banerjee O, Abad ZSH, Krumholz HM, Leskovec J, Topol EJ et al (2023) Foundation models for generalist medical artificial intelligence. Nature 616:259–265
Article CAS PubMed Google Scholar
Northcutt CG, Athalye A, Mueller J (2021) Pervasive label errors in test sets destabilize machine learning benchmarks. arXiv preprint arXiv:2103.14749
OpenAI (2023) GPT-4 Technical Report. https://arxiv.org/abs/2303.08774
Pruneski JA, Pareek A, Kunze KN, Martin RK, Karlsson J, Oeding JF et al (2023) Supervised machine learning and associated algorithms: applications in orthopedic surgery. Knee Surg Sports Traumatol Arthrosc 31(4):1196–1202
Article PubMed Google Scholar
Pruneski JA, Pareek A, Nwachukwu BU, Martin RK, Kelly BT, Karlsson J et al (2023) Natural language processing: using artificial intelligence to understand human language in orthopedics. Knee Surg Sports Traumatol Arthrosc 31(4):1203–1211
Article PubMed Google Scholar
Rädsch T, Reinke A, Weru V, Tizabi MD, Schreck N, Kavur AE et al (2023) Labelling instructions matter in biomedical image analysis. Nat Mach Intell 5:273–283
Article Google Scholar
Rubinger L, Gazendam A, Ekhtiari S, Bhandari M (2023) Machine learning and artificial intelligence in research and healthcare. Injury 54(Suppl 3):S69–S73
Article PubMed Google Scholar
Shiner B, Levis M, Dufort VM, Patterson OV, Watts BV, DuVall SL et al (2022) Improvements to PTSD quality metrics with natural language processing. J Eval Clin Pract 28:520–530
Article PubMed Google Scholar
Singhal K, Tu T, Gottweis J, Sayres R, Wulczyn E, Hou L, et al. (2023) Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617
Wyles CC, Tibbo ME, Fu S, Wang Y, Sohn S, Kremers WK et al (2019) Use of natural language processing algorithms to identify common data elements in operative notes for total hip arthroplasty. J Bone Joint Surg Am 101:1931–1938
Article PubMed Google Scholar
Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C et al (2022) A large language model for electronic health records. NPJ Digit Med 5:194
Article PubMed PubMed Central Google Scholar
Zhu E, Sheng Q, Yang H, Li J (2022) A Unified Framework of Medical Information Annotation and Extraction for Chinese Clinical Text. arXiv preprint arXiv:2203.03823
Zsidai B, Kaarre J, Hamrin Senorski E, Feldt R, Grassi A, Ayeni OR, et al. (2022) Living evidence: a new approach to the appraisal of rapidly evolving musculoskeletal research. Br J Sports Med. https://doi.org/10.1136/bjsports-2022-105570

Download references

Acknowledgements

Not applicable.

Data sharing statement

Not applicable.

Patient and Public Involvement

Not applicable.

Funding

Open access funding provided by University of Gothenburg.

Author information

Authors and Affiliations

Sahlgrenska Sports Medicine Center, Gothenburg, Sweden
Bálint Zsidai, Janina Kaarre, Eric Narup, Eric Hamrin Senorski & Kristian Samuelsson
Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
Bálint Zsidai, Janina Kaarre, Eric Narup, Volker Musahl & Kristian Samuelsson
Department of Orthopaedic Surgery, UPMC Freddie Fu Sports Medicine Center, University of Pittsburgh, Pittsburgh, USA
Janina Kaarre & Volker Musahl
Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
Ann-Sophie Hilkert & Robert Feldt
Medfield Diagnostics AB, Gothenburg, Sweden
Ann-Sophie Hilkert
Department of Health and Rehabilitation, Institute of Neuroscience and Physiology, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
Eric Hamrin Senorski
Sportrehab Sports Medicine Clinic, Gothenburg, Sweden
Eric Hamrin Senorski
IIa Clinica Ortopedica E Traumatologica, IRCCS Istituto Ortopedico Rizzoli, Bologna, Italy
Alberto Grassi
Division of Orthopaedic Surgery, Department of Surgery, McMaster University, Hamilton, Canada
Olufemi R. Ayeni
Department of Mathematics, University of Luxembourg, Esch-Sur-Alzette, Luxembourg
Christophe Ley
Department of Trauma, Hand and Reconstructive Surgery, University Hospital Münster, Münster, Germany
Elmar Herbst
Department of Orthopedic Surgery and Traumatology, Head Knee Surgery and DKF Head of Research, Kantonsspital Baselland, 4101, Bruderholz, Bottmingen, Switzerland
Michael T. Hirschmann
Center of Orthopaedics and Traumatology, University Hospital Brandenburg a.d.H., Brandenburg Medical School Theodor Fontane, 14770, Brandenburg, Germany
Sebastian Kopf
Faculty of Health Sciences Brandenburg, Brandenburg Medical School Theodor Fontane, 14770, Brandenburg, Germany
Sebastian Kopf
Department of Orthopaedic Surgery, Centre Hospitalier Luxembourg and Luxembourg Institute of Health, Luxembourg, Luxembourg
Romain Seil
Clinic for Orthopaedics and Trauma Surgery, Malteser Waldkrankenhaus St. Marien, Erlangen, Germany
Thomas Tischer
Department of Orthopaedics, Sahlgrenska University Hospital, Mölndal, Sweden
Kristian Samuelsson

Authors

Bálint Zsidai
View author publications
You can also search for this author in PubMed Google Scholar
Janina Kaarre
View author publications
You can also search for this author in PubMed Google Scholar
Ann-Sophie Hilkert
View author publications
You can also search for this author in PubMed Google Scholar
Eric Narup
View author publications
You can also search for this author in PubMed Google Scholar
Eric Hamrin Senorski
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Grassi
View author publications
You can also search for this author in PubMed Google Scholar
Olufemi R. Ayeni
View author publications
You can also search for this author in PubMed Google Scholar
Volker Musahl
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Ley
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Herbst
View author publications
You can also search for this author in PubMed Google Scholar
Michael T. Hirschmann
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Kopf
View author publications
You can also search for this author in PubMed Google Scholar
Romain Seil
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Tischer
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Samuelsson
View author publications
You can also search for this author in PubMed Google Scholar
Robert Feldt
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

ESSKA Artificial Intelligence Working Group

Contributions

The initial manuscript was drafted by BZ and RF. All authors contributed substantially to the conception of the idea for this editorial, reviewed and edited the text and approved the final version.

Corresponding author

Correspondence to Bálint Zsidai.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

VM reports educational grants, consulting fees and speaking fees from Smith & Nephew plc, educational grants from Arthrex, is a board member of the International Society of Arthroscopy, Knee Surgery and Orthopaedic Sports Medicine (ISAKOS). In addition, VM is the deputy editor-in-chief of Knee Surgery, Sports Traumatology, Arthroscopy (KSSTA) and has a patent Quantifed injury diagnostics-U.S. Patent No. 9,949,684, Issued on April 24, 2018, issued to University of Pittsburgh. MB reports consulting fees from Bioventus, Pendopharm and Acumed. KS is a member on the board of directors of Getinge AB (publ).

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zsidai, B., Kaarre, J., Hilkert, AS. et al. Accelerated evidence synthesis in orthopaedics—the roles of natural language processing, expert annotation and large language models. J EXP ORTOP 10, 99 (2023). https://doi.org/10.1186/s40634-023-00662-4

Download citation

Received: 10 July 2023
Accepted: 20 September 2023
Published: 28 September 2023
DOI: https://doi.org/10.1186/s40634-023-00662-4

Accelerated evidence synthesis in orthopaedics—the roles of natural language processing, expert annotation and large language models

Fundamental concepts

Barriers to automated data extraction

Condition-specific annotation and NLP frameworks

The potential of large language models

Conclusion

Availability of data and materials

References

Acknowledgements

Data sharing statement

Patient and Public Involvement

Funding

Author information

Authors and Affiliations

Consortia

ESSKA Artificial Intelligence Working Group

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords