wtphm.batch¶
This module contains functions for creating the batch_data
.
See more in the Overview.
-
wtphm.batch.
get_grouped_event_data
(event_data, code_groups, fault_codes)¶ Groups together similar event codes as the same code.
This returns the events dataframe but with some fault events which have different but similar codes and descriptions grouped together and relabelled to have the same code and description.
More info in the Group Faults of the Same Type section of the user guide.
Parameters: - event_data (pandas.DataFrame) – The original events/fault data.
- fault_codes (numpy.ndarray) – All event codes that will be treated as fault events for the batches
- code_groups (list-like, optional, default=None) – The groups of similar events with similar codes/descriptions.
Must be a list or list-of-lists, e.g.
[[10, 11, 12], [24, 25], [56, 57, 58]]
or[10, 11, 12]
.
Returns: - grouped_event_data (pandas.DataFrame) – The
event_data
, but with codes and descriptions fromcode_groups
changed so that similar ones are identical - grouped_fault_codes (pandas.DataFrame) – The
fault_codes
, but with the similar codes in each group treated as identical
-
wtphm.batch.
get_batch_data
(event_data, fault_codes, ok_code, t_sep_lim='12 hour')¶ Get the distinct batches of events as they appear in the
event_data
.Each batch is a group of fault events that occurred during a fault-related shutdown. A batch always begins with a fault event from one of the codes in
fault_codes
, and ends with the codeok_code
, which signifies the turbine returning to normal operation.More info in can be found in Creating Batches.
Parameters: - event_data (pandas.DataFrame) – The original events/fault data.
- fault_codes (numpy.ndarray) – All event codes that will be treated as fault events for the batches
- ok_code (int) – A code which signifies the turbine returning to normal operation after being shut down or curtailed due to a fault or otherwise
- t_sep_lim (str, default=’1 hour’, must be compatible with
pd.Timedelta
) – If a batch ends, and a second batch begins less thant_sep_lim
afterwards, then the two batches are treated as one. It treats the the turbine coming back online and immediately faulting again as one continuous batch. This effect is stacked so that if a third fault event happens less thant_sep_lim
after the second, all three are treated as the same continuous batch.
Returns: batch_data (pd.DataFrame) – DataFrame with the following headings:
turbine_num
: turbine number of the batchfault_root_codes
: the fault codes present at the first timestamp in the batchall_root_codes
: all event start codes present at the first timestamp in the batchstart_time
: start of first event in the batchfault_end_time
:time_on
of the last fault event in the batchdown_end_time
: thetime_on
of the last event in the batch, i.e. the lastok_code
event in the batchfault_dur
: duration from start of first fault event to start of final fault event in the batchdown_dur
: duration of total downtime in the batch, i.e. from start of first fault event to start of lastok_code
eventfault_event_ids
: indices in the events data of faults that occurredall_event_ids
: indices in the events data of all events (fault or otherwise) that occurred during the batch
-
wtphm.batch.
get_batch_stop_cats
(batch_data, event_data, scada_data, grid_col, maint_col, rep_col, grid_cval=0, maint_cval=0, rep_cval=0)¶ Labels the batches with an assumed stop category, based on the stop categories of the root event(s) which triggered them, i.e. the one or more events occurring simultaneously which caused the turbine to stop (items lower down supersede those higher up):
- If all root events in the batch are “normal” events, then the batch is labelled normal
- Otherwise, label as the most common stop cat in the initial events
- If a single sensor category event is present, label sensor
- If a single grid category event is present, label grid. Also label grid if the grid counter was active in the scada data. This is a timer indicating how long the turbine was down due to grid issues, used for calculating contract availability
- If the maintenance counter was active in the scada data, label maint
- There is an additional column labelled “repair”. If the repair counter was active, the turbine was brought down for repairs, and this is given the value “TRUE” for these times.
Parameters: - batch_data (pd.Dataframe) – The batch data
- event_data (pd.Dataframe) – The events data
- scada_data (pd.Dataframe) – The scada data.
- grid_col, maint_col, rep_col (string) – The columns of
scada_data
which contain availabililty counters for grid issues, turbine maintenance and repairs, resepctively - grid_cval, maint_cval (int) – The minimum total sum of the grid, maintenance and repair counters throughout the duration of a batch for it to be marked as grid, repair or maintenance
Returns: batch_data_sc (pd.DataFrame) – The original
batch_data
DataFrame, but with the following headings added:- batch_cat: The stop categories of each batch
- repairs: The repair status of each batch
-
wtphm.batch.
get_root_cats
(batch_data, event_data)¶ Gets the categories for the root alarms in the
batch_data
Parameters: - batch_data (pd.Dataframe) – The batch data
- event_data (pd.Dataframe) – The events data
Returns: root_cats (pd.Series) – Series of tuples, where each tuple contains strings of the
stop_cat
s for each of the root alarms in a batch
-
wtphm.batch.
get_most_common_cats
(root_cats)¶ Gets the most common root fault category from a dictionary of root alarms
Parameters: root_cats (pd.Series) – Series of tuples, where each tuple contains strings of the stop_cat
s for each of the root alarms in a batchReturns: most_common_cats (pd.Series) – Each entry in the series is a string containing the most commonly occurring root fault in cat_counts
. In the case of a draw, then both are added, e.g. ‘test, grid’
-
wtphm.batch.
get_cat_all_ids
(root_cats, cat)¶ Get an index of batches where there is only a single certain category present in the categories of the root alarms.
Parameters: - root_cats (pd.Series) – Series of strings, where each string is the categories of each of the root alarms in a batch, separated by commas.
- cat (string) – The category to check the presence of
Returns: cat_present_idx (pd.Index) – The index of batch entries where
cat
was the only category present in theroot_cats
-
wtphm.batch.
get_cat_present_ids
(root_cats, cat)¶ Get an index of batches where a certain category is present in the categories of the root alarms.
Parameters: - root_cats (pd.Series) – Series of strings, where each string is the categories of each of the root alarms in a batch, separated by commas.
- cat (string) – The category to check the presence of
Returns: cat_present_idx (pd.Index) – The index of batch entries where
cat
was present in theroot_cats
-
wtphm.batch.
get_counter_active_ids
(batch_data, scada_data, counter_col, counter_value=0)¶ Get an index of batches during which a certain scada counter was active
In 10-minute SCADA data there are often counters for when the turbine was in various different states, for calculating contractual availability. This function finds the named
counter_col
inscada_data
, and identifies any sample periods where this value was abovecounter_value
.If any of these sample periods fall within a certain batch, then this function returns those batch ids.
Parameters: - batch_data (pd.DataFrame) – The batches of events
- scada_data (pd.DataFrame) – The 10-minute SCADA data
- counter_col (string) – The column in the SCADA data with a counter
- counter_value (int) – Any SCADA entries with a counter above this value will have their index returned
Returns: counter_active_index (pd.Index) – The id’s of
counter_col
columns inscada_data
which have a val abovecounter_value
.