Deep learning techniques for biomedical and health informatics

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	395
Dung lượng	10,88 MB
File đính kèm	45. Deep Learnin.rar (8 MB)

Cấu trúc

Preface
- Overview
- Objective
- Organization
- Target Audiences
Acknowledgements
Contents
Editors and Contributors
Abbreviations
Deep Learning for Biomedical Engineering and Health Informatics
MedNLU: Natural Language Understander for Medical Texts
- 1 Introduction
- 2 Related Works
- 3 Methodology
  - 3.1 Word Embedding
  - 3.2 Character Embedding
  - 3.3 Feature Vector
  - 3.4 Bidirectional Long Short-Term Memory (Bi-LSTM)
  - 3.5 Conditional Random Fields
- 4 Corpora Statistics
- 5 Experiments and Observations
- 6 Conclusion
- References
Deep Learning Based Biomedical Named Entity Recognition Systems
- 1 Introduction
- 2 Literature Review
- 3 Architecture
  - 3.1 Extraction of Features in Sentence Level
  - 3.2 Criteria of Label
  - 3.3 Stochastic Gradient
- 4 Experiment
- 5 Result of Experiment and Its Analysis
- 6 Conclusion and Future Scope
- References
Disambiguation Model for Bio-Medical Named Entity Recognition
- 1 Introduction
  - 1.1 Rule-Based Approach
  - 1.2 Dictionary-Based Approach
  - 1.3 Machine Leaning Based Approach
- 2 Background
  - 2.1 Deep Learning Technique
- 3 Methodology
- 4 Evaluation
  - 4.1 Dataset Description
  - 4.2 Evaluation Metric
  - 4.3 Post Processing and Parameters Setting
- 5 Result and Discussion
- 6 Conclusions
- References
Applications of Deep Learning in Healthcare and Biomedicine
- 1 Introduction
  - 1.1 Machine Learning
  - 1.2 Artificial Neural Network
  - 1.3 Deep Learning
- 2 Deep Learning: Recent Trends
  - 2.1 In Non-biological Domains
  - 2.2 In Biological Domain
- 3 Applications of Deep Learning in Biomedicine
  - 3.1 Biomarkers
  - 3.2 Genomic Study
  - 3.3 Transcriptomic Analysis
  - 3.4 Medical Image Processing
  - 3.5 Splicing
  - 3.6 Proteomic Study
  - 3.7 Structural Biology and Chemistry
  - 3.8 Drug Discovery
- 4 Applications of Deep Learning in Health Care
  - 4.1 Translational Bioinformatics
  - 4.2 Universal Sensing for Health and Wellbeing
  - 4.3 Informatics in Medicine
  - 4.4 Public Health
- 5 Challenges of Deep Learning in Biomedicine and Healthcare
- 6 Conclusion
- References
Deep Learning for Clinical Decision Support Systems: A Review from the Panorama of Smart Healthcare
- 1 Introduction
- 2 Deep Learning and Image Analysis
- 3 DL and Natural Language Processing
  - 3.1 Challenges for Using DL for NLP in Healthcare
- 4 DL and Wearable Device Technology
- 5 Issues in Using DL for CDSS
- 6 Future Research Directions
- 7 Conclusions
- References
Review of Machine Learning and Deep Learning Based Recommender Systems for Health Informatics
- 1 Introduction to Biomedical and Health Informatics
- 2 Introduction to Recommender System
  - 2.1 Application in Healthcare
  - 2.2 System Architecture
- 3 Overview of Health Recommender System
- 4 Learning Techniques for Health Informatics
  - 4.1 Supervised Learning
  - 4.2 Semi-supervised Learning
  - 4.3 Unsupervised Learning
  - 4.4 Performance Metrics
- 5 Deep Learning for Health Data
  - 5.1 Supervised Learning
  - 5.2 Unsupervised Deep Learning
- 6 Conclusion
- References
Deep Learning and Electronics Health Records
Deep Learning and Explainable AI in Healthcare Using EHR
- 1 Introduction
- 2 Related Work
- 3 Proposed Methodology
  - 3.1 Conceptual System Design
  - 3.2 Attention Models
  - 3.3 GRU: How It Works
  - 3.4 LIME Algorithm
- 4 Results and Discussions
  - 4.1 Multi-layer Perceptron(MLP)
  - 4.2 Random Forest Algorithm
  - 4.3 Naive Bayes Algorithm
  - 4.4 Results for Attention Mechanisms
- 5 Conclusions
- References
Deep Learning for Analysis of Electronic Health Records (EHR)
- 1 Introduction
- 2 Electronic Health Record (EHR) Systems
- 3 An Overview of Machine Learning
- 4 Deep Learning and Its Approaches
  - 4.1 Multilayer Perceptron (MLP)
  - 4.2 Convolutional Neural Networks (CNN)
  - 4.3 Recurrent Neural Networks (RNN)
  - 4.4 Auto-encoders (AE)
  - 4.5 Restricted Boltzmann Machine (RBM)
- 5 Deep EHR Learning Applications
  - 5.1 EHR Information Extraction (IE)
  - 5.2 EHR Representation Learning
- 6 Interpretability
  - 6.1 Maximum Activation
  - 6.2 Constraints
  - 6.3 Qualitative Clustering
  - 6.4 Mimic Learning
- 7 Discussion and Future Prospectus
- References
Application of Deep Architecture in Bioinformatics
- 1 Introduction
  - 1.1 Deep Learning: An Overview
  - 1.2 An Overview of Protein Structures
- 2 Deep Learning Approaches for Predicting Protein Structures
  - 2.1 Predicting with Long Short Term Memory (LSTM) Network
  - 2.2 Deep Supervised and Convolutional Generative Stochastic Networks
  - 2.3 Latent Convolutional Neural Networks
- 3 Deep Learning Approach for Protein–Protein Interaction and Protein Function Prediction
  - 3.1 Identification of Protein Function Based on Its Structure Using Deep CNN
  - 3.2 DL Based PPI Interface Residue Pair Prediction
- 4 DL in Medical Imaging and Disease Diagnosis
  - 4.1 Patch-Based CNN Approach for Brain MRI Segmentation
- References
Intelligent, Secure Big Health Data Management Using Deep Learning and Blockchain Technology: An Overview
- 1 Introduction
- 2 Related Works
- 3 Preliminaries
  - 3.1 Internet of Things (IoT)
  - 3.2 Big Data
  - 3.3 Deep Learning
  - 3.4 Popular Deep Learners
  - 3.5 Applications and Challenges of Deep Learners
  - 3.6 Blockchain Technology
  - 3.7 Types of Blockchain
  - 3.8 Challenges of Blockchain in Healthcare
- 4 System Model
- 5 Open Research Issues
- 6 Conclusion
- References
Malaria Disease Detection Using CNN Technique with SGD, RMSprop and ADAM Optimizers
- 1 Introduction
- 2 Background
  - 2.1 Convolutional Neural Network (CNN)
  - 2.2 Stochastic Gradient Descent (SGD)
  - 2.3 RMSprop
  - 2.4 Adaptive Moment Estimation (ADAM)
- 3 Automated Diagnosis of Malaria
  - 3.1 Image Acquisition
  - 3.2 Data Visualization
  - 3.3 Data Preprocessing
- 4 Proposed Model
  - 4.1 Malaria Detection Using SGD Optimizer
  - 4.2 Malaria Detection Using RMSprop Optimizer
  - 4.3 Malaria Detection Using ADAM Optimizer
- 5 Comparison of Different Techniques
- 6 Conclusion and Future Work
- References
Deep Reinforcement Learning Based Personalized Health Recommendations
- 1 Introduction
- 2 Background
  - 2.1 Recommendation Systems
  - 2.2 Facts and Figures
  - 2.3 Big Data
  - 2.4 Reinforcement Learning
- 3 Problems
  - 3.1 Data Utilization
  - 3.2 Health Awareness
  - 3.3 Doctor to Patient Ratio
  - 3.4 Information Security
- 4 The Limitations of Existing Solutions
  - 4.1 Lack of an All-round Solution
  - 4.2 System Bias
  - 4.3 Myopic Recommendation
- 5 Features of RL that Can Help Solve the Problems
  - 5.1 Discounted Future Rewards
  - 5.2 Exploration-Exploitation Control
  - 5.3 Ability to Learn in Dynamic Environments
- 6 The Proposed Framework
  - 6.1 The Data Preprocessing Layer
  - 6.2 The Disease Prediction Layer
  - 6.3 The Recommendation Generation Layer
- 7 Future Improvements
  - 7.1 Actor-Critic Recommendation System
  - 7.2 Recommendations
  - 7.3 Data Preprocessing
  - 7.4 Disease Prediction
- 8 Conclusion
- References
Using Deep Learning Based Natural Language Processing Techniques for Clinical Decision-Making with EHRs
- 1 Introduction
- 2 Deep Learning for Natural Language Processing
  - 2.1 Distributed Representation
  - 2.2 Convolutional Neural Networks (CNN)
  - 2.3 Recurrent Neural Networks
  - 2.4 Transformer-Based Neural Networks
  - 2.5 Generative Adversarial Network
- 3 Major Applications of Deep Learning in Medical Information Processing
  - 3.1 Representation Learning (RL)
  - 3.2 Information Extraction (IE)
  - 3.3 Clinical Predictions (CP)
- 4 Challenges and Remaining Problems
- 5 Conclusion and Direction of Future Research
- References
Deep Learning for Medical Image Processing
Diabetes Detection Using ECG Signals: An Overview
- 1 Introduction
- 2 Diabetes
  - 2.1 Diabetes and Its Associated Mechanism
  - 2.2 Types of Diabetes
  - 2.3 Complications Due to Diabetes
  - 2.4 Causes (Risk Factors) of Diabetes
  - 2.5 Treatment and Management of Diabetes
- 3 Common Methods of Diabetes Detection
  - 3.1 Invasive Methods of Diabetes Detection (Blood Testing)
  - 3.2 Non-invasive Methods of Diabetes Detection (Using ECG Analysis)
- 4 Machine Learning for Diabetes Detection
  - 4.1 Time Domain Methods
  - 4.2 Frequency Domain Methods
  - 4.3 Wavelet Transform
  - 4.4 Nonlinear Methods
- 5 Methodology of Deep Learning Techniques
  - 5.1 Autoencoder (AE)
  - 5.2 Convolutional Neural Network (CNN)
  - 5.3 Recurrent Structures (RNN, LSTM and GRU)
  - 5.4 Hybrid of CNN-RNN, CNN-LSTM, CNN-GRU
- 6 Literature Survey
  - 6.1 Earlier Methods of Analysis of HRV Signals
  - 6.2 Previous Works of Diabetes Detection Using Heart Rate (Including Machine Learning Based)
  - 6.3 Deep Learning Based Diabetes Detection Works Using HRV
- 7 Architecture and Implementation of Deep Learning Architecture—Sample Study
- 8 Deep Learning in Big Data Analysis: Limitations and Challenges
- 9 Conclusion
- References
Deep Learning and the Future of Biomedical Image Analysis
- 1 Introduction
  - 1.1 Deep Learning
  - 1.2 Biomedical Imaging
  - 1.3 Role of Deep Learning in Diagnosis from Various Medical Images
  - 1.4 Applications
- 2 Deep Learning in Medical Imaging
  - 2.1 Classification
  - 2.2 Detection
  - 2.3 Segmentation
  - 2.4 Registration
  - 2.5 Other Tasks in Medical Imaging
- 3 Future of Deep Learning in Biomedical Imaging
  - 3.1 Recent Methods and Predictive Models
- 4 Challenges and Issues
- References
Automated Brain Tumor Segmentation in MRI Images Using Deep Learning: Overview, Challenges and Future
- 1 Introduction
- 2 What Is Brain Tumor?
  - 2.1 Types of Brain Tumor [6–9]
- 3 What Is Deep Learning?
  - 3.1 Deep Learning Architecture and Neural Network
- 4 Benefits of Deep Learning Over Machine Learning
  - 4.1 Comparison of Different Architecture of Deep Learning Models
- 5 Brain Tumor Classification Steps
  - 5.1 MR Image Acquisitions
  - 5.2 Image Preprocessing
  - 5.3 Feature Extraction
  - 5.4 Feature Reduction Using Genetic Algorithm
  - 5.5 Neuro Fuzzy Modeling
  - 5.6 ANFIS Editor
  - 5.7 Training and Testing Phase
  - 5.8 Image Segmentation Methods [59–61]
  - 5.9 Fuzzy C-Means Segmentation
- 6 Various Challenges Faced by Deep Learning
  - 6.1 Huge Amount of Data
  - 6.2 Domain Specific and Multi-tasking
  - 6.3 Deep Learning Is Intrinsically a Black Box
  - 6.4 Optimizing Hyper-parameters
  - 6.5 Requires High Performance Hardware
  - 6.6 Less Flexibility
- 7 Research Issues and Future Perspectives
  - 7.1 Enhancements in Deep Learning Approach
  - 7.2 Big Image Data Exploitation
  - 7.3 Pervasive Inter-organization Collusion
  - 7.4 Privacy and Judicial Concerns
- 8 Performance Comparison
- 9 Conclusion
- 10 Future Scope
- References

Nội dung

Kumar

of Technology Raipur, Raipur, Chhattisgarh, India

Avinash Kumar School of Computer Engineering, KIIT DU, Bhubaneswar, India

Ujjwal Maulik Department of Computer Science and Engineering, Jadavpur University, Jadavpur, Kolkata, India

Neha Miglani Department of Computer Engineering, National Institute of Technology, Kurukshetra, India

Pragatika Mishra Gandhi Institute for Technology, Bhubaneswar, India

Shubham Mittal Delhi Technological University, Delhi, India

Jayraj Mulani Department of Computer Science and Engineering, Institute of Technology Nirma University, Ahmedabad, India

Jigna Patel Department of Computer Science and Engineering, Institute ofTechnology Nirma University, Ahmedabad, India

Jitali Patel Department of Computer Science and Engineering, Institute of Technology Nirma University, Ahmedabad, India

Chittaranjan Pradhan School of Computer Engineering, KIIT DU, Bhubaneswar, India

U Reshma Arnekt Solutions Pvt Ltd., Magarpatta City, Pune, Maharashtra, India

Jayita Saha Computer Science and Engineering, Jadavpur University, Kolkata, India

Sohail Saif Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, West Bengal, India

E Sandeep Kumar Department of Telecommunication Engineering, M.S. Ramaiah Institute of Technology, Bengaluru, India

Sobhangi Sarkar School of Computer Engineering, KIIT DU, Bhubaneswar, India

Pappu Satya Jayadev Department of Electrical Engineering, IIT Madras, Chennai, India

Sagnik Sen Department of Computer Science and Engineering, Jadavpur University, Jadavpur, Kolkata, India

Minakshi Sharma Department of Computer Engineering, National Institute of Technology, Kurukshetra, India

Gayatri Shinde Department of Computer Engineering, VESIT, Mumbai, India

K P Soman Amrita School of Engineering, Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India

Vignesh Subramanian Department of Computer Engineering, VESIT, Mumbai, India

G Swapna Amrita School of Engineering, Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India

Xinhui Tu School of Computer Science, Central China Normal University, Wuhan, China

Kalpan Tumdi Department of Computer Science and Engineering, Institute of Technology Nirma University, Ahmedabad, India

R Vinayakumar Amrita School of Engineering, Center for ComputationalEngineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore,India

Runjie Zhu Information Retrieval and Knowledge Management Research Lab,Department of Electrical Engineering and Computer Science, York University,Toronto, Canada

ADC Analog-to-digital converter

ADHD Attention-deﬁcit hyperactivity disorder

ADNI Alzheimer’s disease neuroimaging initiative

AFLC Adaptive fuzzy leader clustering algorithm

AMBE Absolute mean brightness error

ANFIS Adaptive neuro-fuzzy inference system

Anti-CPP Anti-cyclic citrullinated peptide

BERT Bidirectional Encoder Representations From Transformer

BETA Blackbox Explanations Using Transparent Approximations Bi-LSTM Bidirectional long short-term memory

Bio-NER Biomedical named entity recognition

BMESO B-Begin M-Middle E-End S-Single O-Outside

BoW Bag of words xix

BRCA1 Breast cancer gene type 1

CAMDM Computer-aided medical decision making

CBIR Content-based image retrieval

CBoW Continuous bag of words

CDBN Convolutional deep belief networks

CDC Centers For Disease Control And Prevention

CDP Code On Dental Procedures And Nomenclature

CDSS Clinical decision support system

CGMS Continuous glucose monitoring system

CLAHE Contrast-limited adaptive histogram equalization

CRPS Continuous ranked probability score

DCNNs Deep convolutional neural networks

DRMM Deep relevance matching model

ELMO Embeddings from language models

ESRD End-stage renal disease

FDA Food And Drug Administration

FHE Fuzzy logic-based histogram equalization

FITBIR The Federal Interagency Traumatic Brain Injury Research

GBDT Gradient boosting decision trees

GRAM Graph-based attention model

HCPCS Healthcare Common Procedure Coding System

HCUP Healthcare Cost And Utilization Project

HPI History of patient illness

HRV Heart rate variability i2b2 Informatics For Integrating Biology and The Bedside

ICD International Classiﬁcation of Diseases

ICD9 International Classiﬁcation of Diseases 9

KPCA Kernel principal component analysis

LIDC Lung Image Database Consortium Dataset

LSTM RNN Long short-term Memory RNN

LSTM Long short-term memory

MCEMJ Medical Concept Embeddings From Medical Journals

MDF Markov decision process medGAN Medical Generative Adversarial Network

MEMM Maximum entropy Markov model

MICCAI Medical image computing and computer-assisted intervention MIDAS The Multimedia Medical Archiving System

MILA Montreal Institute For Learning Algorithms

MIMIC Medical Information Mart For Intensive Care

NIHCC National Institute of Health Clinical Centre

NLM National Library of Medicine

OASIS Open Access Series of Imaging Studies

OGTT Oral glucose tolerance test

PINN Pairwise input neural network

POMDP Partially observed Markov decision process

PSNR Peak signal–noise ratio

QSAR Quantitative structure−activity Relationship

RCNNs Region convolutional neural networks

RMSE Recursive mean separate histogram equalization

RMSProp Root mean square propagation

RP LIME Random pick local interpretable model

RSNA Radiological Society of North America

SBE Surrounding-based embedding feature

SEER Survival Epidemiology And End Results Program

SHMS Smart healthcare monitoring system

SIFT Scale-invariant feature transform

SiRNA Small interfering ribonucleic acid

SP LIME Selective pick local interpretable model

SPECT Single-photon emission computed tomography

SPPMI Shifted positive pointwise mutual information

SRL-RNN Supervised reinforcement learning with recurrent neural network SSIM Structural similarity index mean

STARE Structured analysis of the retina

TCIA The Cancer Imaging Archive

TP True positive t-SNE T-distributed stochastic neighbor embedding

UCI University of California, Irvine

UMLS Uniﬁed medical language system

USF University of Southern California

VEGF Vascular endothelial growth factor

VHL Von-Hippel–Lindau Illness

VIA Visual and image analysis

WBAN Wireless body area network

WBCD Wisconsin Breast Cancer Dataset

H B Barathi Ganesh, U Reshma, K P Soman and M Anand Kumar

Natural Language Understanding (NLU) is crucial for developing clinical text-based applications, primarily achieved through Vector Space Models and Sequential Modelling This study emphasizes sequential modelling techniques, specifically Named Entity Recognition (NER) and Part of Speech Tagging (POS), achieving an impressive F1 score of 93.8% on the i2b2 clinical corpus and 97.29% on the GENIA corpus The paper discusses the effectiveness of feature fusion by combining word, feature, and character embeddings for these tasks Additionally, we introduce MedNLU, a framework designed for sequential modelling that excels in POS tagging, chunking, and entity recognition in clinical texts MedNLU integrates a Convolutional Neural Network, Conditional Random Fields, and a Bi-directional Long-Short Term Memory network for enhanced performance.

Medical fields generate digital data in the form of clinical reports—structured/semi- structured data, raw data and the amount of data consumers/patients generate in

Amrita School of Engineering, Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: barathiganesh.hb@gmail.com

K P Soman e-mail: kp_soman@amrita.edu.com

Arnekt Solutions Pvt Ltd., Pentagon P-3, Magarpatta City, Pune, Maharashtra, India e-mail: reshma.u@arnekt.com

Department of Information Technology, National Institute of Technology Karnataka, Surathkal, India e-mail: m_anandkumar@nitk.edu.in © Springer Nature Switzerland AG 2020

S Dash et al (eds.), Deep Learning Techniques for Biomedical and Health Informatics,

Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_1

In the realm of social media, the vast amount of data generated makes it impossible for any one person to fully grasp all the information This is where Natural Language Processing (NLP), a key subfield of Artificial Intelligence, becomes essential MedNLU is a specialized framework designed to tackle the complexities of interpreting hidden insights within digital health data It serves as a crucial building block for various healthcare applications that rely on effective natural language processing and understanding.

The MedNLU encompasses various subfields of Artificial Intelligence, including Natural Language Processing (NLP), Conventional Machine Learning, and Deep Learning It processes healthcare texts and documents, generating outputs such as tokenized text, chunked text, parsed text, recognized entities, and Part of Speech (POS) tags related to the medical content.

Leveraging entities and POS tags enables the construction of a knowledge base, essential for database management and conversational systems The MedNLU framework's components are illustrated in Fig 1.

Electronic Health Records (EHRs) are essential for generating health documents, encompassing comprehensive patient information such as family history, initial complaints, diagnoses, treatments, prescriptions, lab results, visit records, billing data, demographics, progress notes, vital signs, medical histories, immunizations, allergies, and radiology images This extensive data ensures that clinicians and physicians have immediate access to nearly all pertinent details about a patient.

The MedNLU framework illustrates the transformative impact of Natural Language Processing (NLP) in healthcare, enabling systems to function independently without the need for human document verification Hospitals globally are increasingly adopting NLP technologies in their daily operations, enhancing efficiency and accuracy in medical documentation.

Extracting information is essential for the development of applications such as decision support systems, adverse drug reaction identification, pharmacovigilance, and effective management of pharmacokinetics This process also aids in patient cohort identification and the development and maintenance of electronic medical records (EMR) Key information is primarily obtained through natural language understanding (NLU) tasks, including Named Entity Recognition (NER), Part of Speech (POS) tagging, and chunking.

The medical domain has traditionally relied on Natural Language Understanding (NLU) using rule-based methodologies, which involve hand-coded rules to extract valuable information from medical data These rules were tailored to the specific structure of each document, allowing for effective knowledge extraction However, the introduction of algorithm-driven models has significantly reduced the burden of manual data encoding Recently, researchers have begun applying Deep Learning techniques to healthcare data for tasks such as Named Entity Recognition (NER) and relation extraction, marking a shift towards more advanced analytical methods.

The MedNLU framework enhances sequential modeling tasks by integrating word, feature, and character embeddings, utilizing a combination of Convolutional Neural Networks (CNN), Long-Short Term Memory (LSTM) networks, and Conditional Random Fields (CRF) This innovative approach enables the framework to effectively perform Named Entity Recognition, Part of Speech Tagging, Parsing, and Chunking on clinical texts Notably, the experiment demonstrates that the sequence model architecture achieves state-of-the-art performance even without a domain-specific word embedding model, relying instead on word embeddings derived from general English text.

The effective computation of dense word matrices and the integration of downstream models on word2vec with various architectures have significantly influenced research in leveraging Big Data within the healthcare domain Consequently, text classification tasks, including sentiment analysis, text summarization, information extraction, and information retrieval, are increasingly utilizing word embeddings This trend is particularly pronounced in healthcare and bioinformatics, where specific challenges such as relation extraction, named entity recognition, drug-disease interactions, medical synonym extraction, and chemical-disease relations are receiving heightened focus Both closed-set small corpora and large general corpora, like Google News, are being employed in these efforts.

Wikipedia is frequently utilized for training embedding models; however, these models are not directly applicable to clinical texts, which contain a higher frequency of specialized terminology and do not adhere to standard grammatical structures.

Various methodologies were employed to evaluate word embedding models after computing word vectors, including context predicting and context counting, which assess the relationship between data and correlation issues for lexical semantic tasks The counter predicting model is preferred over the count-based model due to its superior performance Landauer Thomas utilized Latent Semantic Analysis for indirect knowledge acquisition from text, analyzing similarities through local co-occurrence Additionally, Turney applied unsupervised vectors for classification in analogy tasks, and many researchers have since sought to refine this unsupervised learning approach for text applications In the bioinformatics field, Pakhomo conducted assessments of word embeddings.

Clinical text analysis faces significant challenges due to HIPAA restrictions, resulting in limited resources for clinical Part-of-Speech (POS) tagging Notably, Pestian et al reported a POS annotation of 390,000 pediatric sequences at Cincinnati Children’s Medical Centre, achieving 91.5% accuracy with a tagger enhanced by a Special Lexicon However, both the tagger and corpus remain unavailable To address the limitations of clinical text annotation, Liu et al developed sampling methods to co-train a POS tagger alongside the Wall Street Journal (WSJ) corpus Their evaluation of one sampling method on pathology reports revealed a remarkable 84% reduction in training data, while still achieving an impressive accuracy of 92.7%.

The Mayo Clinic in Rochester, Minnesota, developed the MED corpus, which consists of 100,650 POS-tagged tokens derived from 273 clinical notes, addressing the lack of accessible annotated corpora for research This initiative achieved an impressive accuracy of 93.6% when annotations were combined with GENIA and other POS-tagged corpora Additionally, despite the scarcity of clinical text corpora, the Mayo Clinic created cTAKES, a comprehensive biomedical NLP package that serves as a pre-trained, reusable tagging model.

Traditional Named Entity Recognition (NER) methods primarily relied on dictionary and rule-based approaches, necessitating domain expertise for effective rule detection Initially, researchers focused on conventional machine learning techniques or a combination of these with rule-based methods Various supervised and semi-supervised machine learning algorithms have been applied to NER tasks, emphasizing domain-specific attributes and specialized text features Hybrid models that integrate Conditional Random Fields (CRF) with Support Vector Machines (SVM) and various pattern matching rules have demonstrated improved results Additionally, incorporating pre-processing techniques such as annotation and true casing with CRF-based NER has enhanced concept extraction performance The top-performing models in the i2b2 challenge utilized CRF and semi-Markov Hidden Markov Models (HMM), achieving an impressive F-score of 0.85.

Ngày đăng: 30/08/2021, 10:18