Robotics and Automation Engineering Journal

Editorial

Natural Language Processing and Machine Learning Techniques in Real World (Law and Health)

Mi Young Kim*

Department of Science, Augustana Faculty, University of Alberta, Canada

Submission: September 08, 2017; Published: September 21, 2017

*Corresponding author: Department of Science, Augustana Faculty, University of Alberta, 4901-46 Avenue, Camrose, AB, T4V 2R3, Canada, Tel:+1780 679 1104; Email: miyoung2@ualberta.ca

How to cite this article: Mi Young Kim. Natural Language Processing and Machine Learning Techniques in Real World (Law and Health). Robot Autom Eng J. 2017; 1(3): 555562.

DOI: 10.19080/RAEJ.2017.01.555562

Editorial

Every day a large volume of legal documents are produced, and lawyers need support for the analysis of the big documents, especially in corporate litigation. Typically corporate litigation has the aim of finding evidence for or against the litigation claims. Identifying the critical legal points from large volumes of legal text is time consuming and costly, but recent advances in Natural Language Processing and Machine Learning have provided new enthusiasm for improved management of legal texts and the identification of legal relationships. I believe that developing tools to automatically, or semi-automatically confirm entailment relationships between legal texts is fundamental to legal text understanding. Many researchers are interested in developing question-answering tools which help novice users obtain information about their questions in law, and an initial important step is the development of a question answering system for Yes/No questions.

Yes/No question answering is significantly easier than the general question answering, but we still need to develop tools for determining the relationships amongst legal texts. There has been a Competition on Legal Information Extraction/ Entailment (COLIEE) Kim & Goebel [1] since 2014. The COLIEE Competition focuses on two aspects of legal information processing related to answering yes/no questions from legal bar exams: Legal document retrieval (Phase 1), and Yes/No Question answering for legal queries (Phase 2). Many systems have participated in the competition and showed promising results in Yes/No Question answering using Convolutional Neural Network Machine Learning techniques Kim et al. [2] and syntactic parsing of Natural Language Processing Marsi et al. [3], Haermeling [4].

I expect a general question answering system in law domain will be released soon with the improved Natural Language Processing and Machine Learning techniques. In the same way, medical domain has also needed computer science technologies to extract information from a large volume of medical data. Most clinical data consist of unstructured natural sentences (e.g. clinical notes, Tele-Health transcripts), and there are few standard templates for the description format. Even existing templates are limited in scope, brittle in structure, and often adjusted so that calibration is difficult. It is also difficult to automatically extract that information which health care professionals believe is important to improve health care because there is much noise in most directly captured health data.

The development of a tool to parse patient records in order to automatically detect signs of a possible health issue would be a tremendous help for epidemiologists and other health professionals, and it could allow them to react more rapidly to a variety of trends. Recent advances in a variety of Artificial Intelligence (AI) Natural Language Processing techniques, such as information extraction, named entity recognition, and factual assessment, support the development of such tools. As an integral part of Electronic Health Records (EHR), clinical notes pose special challenges for analyzing EHRs due to their unstructured nature and substantial noise because they are written by health practitioners in real time, while talking with patients.

The noise can be divided into two types: First is explicit noise, such as spelling errors, abbreviations, unspecified acronyms, unfinished sentences, term variants, and omission of sentence delimiters. The second is implicit noise, which is revealed only by a variety of inference methods: We have to figure out the intended meaning and linguistic structure of sentences to detect implicit noise. Written information that is not about a patient and untrustworthy information that may not be true are examples of implicit noise.

As part of our noise identification process, we have to detect biomedical name dentities-such as disease, virus, drug, and symptom-as well as temporality, temperature, travel history, and other kinds of patient personal named entities (e.g., age and sex). Biomedical named entity recognition is the recognition of technical terms in the biomedical fields such as diseases, drugs, symptoms, and the like. Examples of biomedical named entity recognition systems include extracting clinical information from radiology reports Fiszman et al. [5], identifying diseases and drug names in discharge summaries Uzuner et al. [6], and detecting gene and protein mentions in biomedical paper abstracts Yeh et al. [7]

There are many systems which showed successful identification of biomedical named entities in clinical notes by removing the described noise using Natural Language Processing and Machine Learning techniques Kim et al. [8]. Furthermore, there are systems which automatically detect frailty of senior patients Castell et al. [9] and early predict pandemic diseases by analyzing Twitter texts Signorini et al. [10]. In near future, health professionals will rely on the Natural Language Processing and Machine Learning Techniques in analyzing their patient data, and system will even recommend proper treatment for each patient based following the analysis results.