Preventive detection of failures

Solution context
The client manages the production and maintenance of jet engines. Each engine is characterized by a set of anonymized metrics. Each motor has between 20 and 50 sensors (depending on the type of motor) attached to its internal components. The data of the different sensors are time-stamped but their meaning anonymized, and their scale modified to protect the industrial secret. This solution gave the customer the ability to:
● Perform exploratory analysis of the different engine and sensor data in the context of the history of maintenance interventions carried out.
● Prepare a general model or a model by engine type to predict failures at 100 operating cycles (anonymized time scale).
Technical Process
Data Acquisition:
During this phase, we performed a transformation of the different data formats available in the client’s data lake to prepare a coherent dataset. Our approach is to position automatic jobs that collect the data and submit it to a unified scheme.
Transformation:
The jobs developed are of two types:
● Configuration jobs: Who are responsible for collecting data on engines, their configuration, and operating environments.
● Streaming jobs: This takes care of the almost real-time loading of sensor data during the operation of the machines.
The goal is to filter and transform the anonymized data to have a batch large enough to perform analysis and prediction.
Exploratory Analysis
The data collected was subject to the following process:
● Univariate analysis: Allowing to explore the values that the different variables take, to study their distribution and to extract the outliers.
● Bivariate analysis: Whose role is to extract the intrinsic relationships
between these variables and study their correlations to avoid
introducing tightly correlated variables and thus introduce bias.
● Multivariate analysis: Applied to the engine and environmental data,
this analysis reduced the dimensions of the problem and provided additional variables to the sensor data.
Time series analysis: This analysis concerns the data of the sensors; it was carried out in order to be able to extract the modes of operation and the different apparent cycles in the data
Result :

The colors correspond to the different configurations of the engines

Prediction Model
The goal of prediction is to be able to recognize a failure at least 100 operating cycles. We started with the use of classical time series prediction models (moving average, SARIMA, etc.) The fundamental problem with these methods is the lack of seasonality in the data, which ends up giving a model that converges to the general average of the future data.
The boosting model with feedback allowed us to arrive at a precision of 85% to detect the 100-cycle failure. The model made it possible to highlight the importance of the different sensors (below) .

