wtphm.pred_processing¶
This module contains functions for processing scada data ahead of using it for fault detection or prognostics. Read more in the Labelling the SCADA data section of the User Guide.
-
wtphm.pred_processing.
label_stoppages
(scada_data, fault_batches, drop_fault_batches=True, label_pre_stop=True, pre_stop_lims=['90 minutes', 0], oth_batches_to_drop=None, drop_type=None)¶ Label times in the scada data which occurred during a stoppage and leading up to a stoppage as such.
This adds a column to the passed
scada_data
, “stoppage”, and an optional column “pre_stop”. “stoppage” is given a 1 if the scada point in question occurs during a stoppage, and “pre_stop” is given a 1 in the samples leading up to the stoppage. Both are 0 otherwise. These vary under different circumstances (see below). It also adds a “batch_id” column. For entries with a “pre_stop” or “stoppage” column of 1, “batch_id” corresponds to the batch giving it that label.Parameters: - scada_data (pandas.DataFrame) – Full set of SCADA data for the turbine.
- fault_batches (pandas.DataFrame) – The dataframe of batches of fault events, a subset of the output of :func:wtphm.batch.get_batch_data`
- drop_fault_batches (bool, default=True) – Whether to drop the scada entries which correspond to the stoppage
periods covered by
fault_batches
. i.e. not the pre-fault data, but the fault data itself. This is highly recommended, as otherwise the stoppages themselves will be kept in the returned data, though the “stoppage” column for these entries will be labelled as “1”, while the fault-free data will be labelled “0”. - label_pre_stop (bool; default=True) – If True, add a column to the returned
scada_data_l
for “pre_stop”. Samples in the time leading up to a stoppage are given label 1, and 0 otherwise. - pre_stop_lims (2*1 list of
pd.Timedelta
-compatible strings, default=[‘90 mins’, 0]) – The amount of time before a stoppage to label scada as “pre_stop”. E.g., by default, “pre_stop” is labelled as 1 in the time between 90 mins and 0 mins before the stoppage occurs. If [‘120 mins’, ‘20 mins’] is passed, scada samples from 120 minutes before until 20 minutes before the stoppage are given the “pre_stop” label 1. - oth_batches_to_drop (pd.DataFrame, optional; default=None) – Additional batches, independent of dropping the
fault_batches
ifdrop_fault_batches
is passed, which should be dropped from the scada data. If this is passed,drop_type
must be given a string as well. - drop_type (str, optional; default=None) – Only used when
oth_batches_to_drop
has been passed. If ‘both’, the stoppage and pre-stop entries (according to pre_stop_lims) corresponding to batches inoth_batches_to_drop
are dropped from the scada data. If ‘stop’, only the stoppage entries are dropped If ‘pre’, opnly the pre-stop entries are dropped
Returns: scada_data_l (pd.DataFrame) – The original scada_data dataframe with the “pre_stop”, “stoppage” and “batch_id” columns added.
-
wtphm.pred_processing.
get_lagged_features
(X, y, features_to_lag_inds, steps)¶ Returns an array with certain columns as lagged features for classification
Parameters: - X (m*n np.ndarray) – The input features, with m samples and n features
- y (m*1 np.ndarray) – The m target values
- features_to_lag_inds (np.array) – The indices of the columns in
X
which will be lagged - steps (int) – The number of lagging steps. This means for feature ‘B’ at time T, features will be added to X at T for B@(T-1), B@(T-2)…B@(T-steps).
Returns: - X_lagged (np.ndarray) – An array with the original features and lagged features appended. The number of samples will necessarily be decreased because there will be some samples at the start with NA values for features.
- y_lagged (np.ndarray) – An updated array of target vaues corresponding to the new number of
samples in
X_lagged