Applying Knowledge Discovery Process in General Aviation Flight Performance Analysis

The air transportation is a technology-intensive industry that has found itself collecting large volumes of data in a variety of forms from daily operations. Aviation data play a critical role in numerous aspects of aviation industry, aircraft flight performance is one of the most important uses of aviation data. Aircraft operational data are usually collected using aircraft onboard flight data recording devices and have been traditionally used for monitoring flight safety and aircraft maintenance with basic statistical analysis and threshold exceedance detection. With the development of data science and advanced computing technology, there is a growing awareness of incorporating knowledge discovery process into aviation operations. This article provides a review of recent studies on flight data analysis with two example studies on applying knowledge discovery process in flight performance analyses of general aviation.


Introduction
In the field of aviation, data analyses have been widely adopted for a variety of needs, such as aviation safety improvement, airspace utility assessment, and operational efficiency measuring. With different purposes, aviation data are collected, analyzed and interpreted from different perspectives with a variety of techniques. For example, data of air traffic volume are usually used for airspace management and airline network planning, data of transportation gross or passenger load factor are used for airline's economic performance related analyses, and flight operational data from flight data recorder could be used for safety analyses. The number of common aviation data analysis techniques documented by the Federal Aviation Administration (FAA) System Safety Handbook reaches as many as 81, and there are more techniques are being developed [1]. Because of the diversity of aviation data and analytical purposes, expensive investment on technological equipment, proprietary software, and long-term labor costs for data collection and analytics is required for aviation operators.

Trends in Technical & Scientific Research
Knowledge discovery is a nontrivial extraction of implicit, previously unknown, and potentially useful information from a collection of data, including a process of obtaining raw data, cleaning, transforming data, and modeling and converting data into useful information to support decision-making, shown as Figure 1 [2,3].
As an interdisciplinary area, knowledge discovery process widely involves database technology, information science, statistics, machine learning, visualization, and other disciplines, and includes the following nine steps: a) Develop an understanding of the application domain and the relevant prior knowledge, and identify the goal of the KDD process from the customer's perspective, b) Select a target data set or subset of data samples on which discovery is to be performed, c) Data cleaning and preprocessing, including removing noise, collecting necessary information to model or account for noise, deciding on strategies for handling missing data, and accounting for time-sequence information and known changes, d) Data reduction and projection by finding useful features to represent the data depending on the goal or task, e) Match the goals of the KDD process to a particular data mining method, f) Exploratory analysis and model and hypothesis selection by choosing the data mining algorithms and selecting methods to be used for searching for data patterns, g) Data mining to search for patterns of interest in a particular representational form or a set of such representations, such as classification rules or trees, regression, and clustering, h) Interpret the mined patterns, possibly return to any of steps 1 through 7 for needed iteration, i) Apply the discovered knowledge directly or incorporate the knowledge into another system for further actions [3].
By taking the advantages of the information and communication technologies, knowledge discovery process has been used to extract useful information from the massive data coming from different fields, such as marketing, finance, sports, astrology, and science exploration. In other words, knowledge discovery process is applicable for a wide range of data-driven cases with appropriate design and implementation, aviation industry is no exception. Many studies have been conducted to adopt the knowledge discovery process for the development of advanced aviation data analysis techniques. The article provides a review of recent studies on flight data analysis with two example studies on applying knowledge discovery process in flight performance analyses of general aviation (GA).

Review of Flight Data Analysis in GA
The United States has the largest and most diverse GA community in the world performing an important role in noncommercial business aviation, aerial work, instructional flying, and pleasure flying [4]. During the last decades, GA accident rates indicate a decreasing trend, but there were still estimated 347 people killed in 209 GA accidents in 2017 [5]. Reducing GA accident rates has been a challenge for many years. The FAA and industry have been working on several initiatives to improve GA safety, such as the General Aviation Joint Steering Committee (GAJSC), Equip 2020 for ADS-B Out, new Airman Certification Standards (ACS), and the Got Data? External Data Initiative [5]. Compared to last century, there are fewer aviation accidents with common causes. Traditional aviation safety improvement strategies relying on reactively investigating aircraft accidents and incidents are no longer enough to support further improve aviation safety. Therefore, government and the aviation industry have steered safety enhancement strategies from reactive approaches to proactive approaches [6]. Given the effectiveness of Flight Data Monitoring/Flight Operational Quality Assurance (FDM/FOQA) programs on commercial aviation safety improvement, the FAA and industry are also focused on reducing GA accident rate by primarily using a voluntary, non-regulatory, proactive, data-driven strategies [5]. For example, de-identified GA operational data were used in the Aviation Safety Information Analysis and Sharing (ASIAS) program to identify risks before they cause accidents [5]. The National General Aviation Flight Information Database (NGAFID) was launched as a joint FAA-industry initiative designed to bring voluntary FDM to general aviation, and a datalink between ASIAS and the NGAFID was built by the University of North Dakota in 2013 [7].
Today, the FDM is also known as flight data analysis or operational flight data monitoring (OFDM) under the framework of International Civil Aviation Organization (ICAO) and other civil aviation authorities, as shown in Figure 2 [8]. Although the features of each program may vary, most of them are developed on two primary approaches: The exceedance detection approach and the statistical analysis approach [9]. Exceedance detection looks for deviation from flight manual limits and standard operating procedures (SOPs). Exceedance detection approach detects predefined undesired safety occurrences. It monitors interesting aircraft parameters and trigger warning or draws attentions of safety specialists when parameters hit the preset limits or baselines under certain conditions. For example, the program can be set to detect the events when the aircraft parameters of speed, altitude, or attitude are higher than predefined thresholds. 4(2): 555633. DOI: 10.19080/TTSR.2020.04.555633 Statistical analysis approaches are used to create the flight profiles, plot the distributions and trends of certain types of flight parameters, or map flight track on geo-referenced chart to examine particular operational features of flight. By using statistical analysis approaches, aviation operators not only obtain numeric features of flight operations, but also acquire a more comprehensive picture of the flight operations based on the distributions of aggregated flight data [9]. Statistical analysis is a tool to look at the total performance and determine the critical safety concerns for flight operations. In addition, both exceedance analysis and statistical analysis can dive into the data on a specific target, such as phases of flight, airports, or aircraft type.

Trends in Technical & Scientific Research
Many observations in aviation data are either spatially or temporally related, for instance, aircraft flight parameters captured by onboard flight data recorder, tracks from radar, and aircraft GPS position data are all in the form of sequential observations. In addition to above two prevalent flight data analysis approaches for flight safety assurance, many other data analysis techniques are being developed and used for more specific objectives.

Exceedance Detection of GA Flight Operations
Flight data analysis is an effective strategy for proactive safety management in aviation. In addition to Part 121 commercial air carriers, Part 135 operations are also highly encouraged to adopt Flight Data Monitoring (FDM) as one of the most wanted transportation safety improvements [10]. However, the implementation of flight data analysis requires significant investment in flight data recording technology, data transferring, and professional software and labor cost for data analytics. Because of the high cost of flight data analysis, totally only 53 air transportation service operators in the U.S. have a FOQA program implemented [11]. Moreover, most flight data analysis strategies for commercial air carriers adopt the Ground Data Replay and Analysis System (GDRAS), which is typically a proprietary software with predesigned functionalities and replies on a great number of flight parameters fed from advanced flight data recorder. Due to the resource constraint of general aviation, traditional flight data analyses are usually unaffordable and not flexible to meet the demand of GA operators given GA aircraft have less sophisticated avionics onboard and diverse operational characteristics.
An innovative flight exceedance detection strategy was explored based on knowledge discovery process and next generation air traffic surveillance technology -automatic dependent surveillance broadcast (ADS-B) [12,13]. This strategy is expected to provide an inexpensive flight data analysis strategy particularly for GA operations by eliminating the dependency of proprietary GDRAS and investment of expensive onboard flight data recorder. These studies collected aircraft operational data from ADS-B and followed the knowledge discovery process to preprocess, transform and analyzed the data, as shown in Figure 3.

Trends in Technical & Scientific Research
The exceedance detection procedure used in the study is shown as Figure 4. In total, a set of 29 flight metrics were developed based on the content of ADS-B data for the purpose of exceedance detection and flight performance measurement. Five flight exceedances were identified from aircraft operations manual and airplane information manual for experiment: The study result shows certain types of exceedances could be more accurately detected than other exceedance events by using ADS-B data. The primary reason is because of the missing values of ADS-B data as it is transmitted wirelessly on 1090MHz or 978MHz. However, flight data analysis using ADS-B data is expected to be a promising strategy with further research and development.  With the modernization of GA fleet, there are more and more GA aircraft equipped with advanced digital flight data recorders, which provide quick access to GA flight data. Therefore, aircraft operational data become more accessible for flight performance analysis. One of such studies explored the fuel consumption efficiency of GA piston-engine aircraft by discovering the relationship between aircraft operational parameters [14]. Following the knowledge discovery process, 22 sets of flight operational data with 176,370 data observations were collected from Garmin G1000 avionics system installed on GA pistonengine aircraft -Cirrus SR20, and transformed and analyzed with machine learning techniques. Statistical relationship between the fuel flow rate and three aircraft parameters (aircraft ground speed, flight altitude, and the vertical speed) was modeled. The classification and Regression Trees (CART) were used to predict the fuel flow rate using the three explanatory aircraft parameters, as shown in Figure 5. By developing the model, GA operators could intuitively estimate the fuel flow rate of aircraft at any given time with only three other aircraft parameters, which could be acquired real-time or post-flight from many available aeronautic technologies. In addition, analyses in this study also show that aircraft groundspeed and vertical speed have higher impact on the fuel flow rate than the flight altitude, which provides GA operators of important intelligence to optimize the fuel consumption efficiency [14].

Discussion and Conclusion
This article reviews the recent progress of aviation data analyses and two example studies of applying knowledge discovery process in GA flight performance analyses from different perspectives. Two example studies illustrate how knowledge discovery process are practically addressing different demands in flight performance analyses in safety measurement and operational efficiency monitoring. Knowledge discovery process has been widely practiced in many non-aeronautic fields by taking the advantages of the improvement of new information and communication technologies. As an important part of transportation industry, aviation has been incorporating more data-driven strategies in operations, management, and safety. Knowledge discovery process incorporates different data sources for data analyses so that it could support diverse knowledge discovery purposes. In knowledge discovery process, data are selected upon the analysis objectives and analyzed from different viewpoints to discover interesting patterns driven by the main goal of supporting better decision-making. All of those features make it a promising strategy in the world of air transportation.
While knowledge discovery is a promising strategy to extracting useful information from massive aviation data, applying knowledge discovery for a specific objective relies on good input of domain knowledge and well addressing constrains in each step of knowledge discovery process. First, domain knowledge generally determines how practical the entire knowledge discovery process is, and how useful the output knowledge could be. Staring from determining target data and choosing appropriate data mining techniques, solid domain knowledge decides whether the selected target data and analytic strategies fit the desired research objectives. For the later steps of interpretation and reporting the outcomes, domain knowledge arbitrates whether explanations of discovered knowledge is applicable and valuable given the research background. Second, issues from database constrain the effectiveness of knowledge discovery projects. Data analysts should take above factors into account when practice knowledge discover process in the field of aviation.