wtphm.batch¶

This module contains functions for creating the batch_data.

See more in the Overview.

wtphm.batch.get_grouped_event_data(event_data, code_groups, fault_codes)¶

Groups together similar event codes as the same code.

This returns the events dataframe but with some fault events which have different but similar codes and descriptions grouped together and relabelled to have the same code and description.

More info in the Group Faults of the Same Type section of the user guide.

Parameters:

event_data (pandas.DataFrame) – The original events/fault data.
fault_codes (numpy.ndarray) – All event codes that will be treated as fault events for the batches
code_groups (list-like, optional, default=None) – The groups of similar events with similar codes/descriptions. Must be a list or list-of-lists, e.g. [[10, 11, 12], [24, 25], [56, 57, 58]] or [10, 11, 12].

Returns:

grouped_event_data (pandas.DataFrame) – The event_data, but with codes and descriptions from code_groups changed so that similar ones are identical
grouped_fault_codes (pandas.DataFrame) – The fault_codes, but with the similar codes in each group treated as identical

wtphm.batch.get_batch_data(event_data, fault_codes, ok_code, t_sep_lim='12 hour')¶

Get the distinct batches of events as they appear in the event_data.

Each batch is a group of fault events that occurred during a fault-related shutdown. A batch always begins with a fault event from one of the codes in fault_codes, and ends with the code ok_code, which signifies the turbine returning to normal operation.

More info in can be found in Creating Batches.

Parameters:

event_data (pandas.DataFrame) – The original events/fault data.
fault_codes (numpy.ndarray) – All event codes that will be treated as fault events for the batches
ok_code (int) – A code which signifies the turbine returning to normal operation after being shut down or curtailed due to a fault or otherwise
t_sep_lim (str, default=’1 hour’, must be compatible with pd.Timedelta) – If a batch ends, and a second batch begins less than t_sep_lim afterwards, then the two batches are treated as one. It treats the the turbine coming back online and immediately faulting again as one continuous batch. This effect is stacked so that if a third fault event happens less than t_sep_lim after the second, all three are treated as the same continuous batch.

Returns:

batch_data (pd.DataFrame) – DataFrame with the following headings:

turbine_num: turbine number of the batch
fault_root_codes: the fault codes present at the first timestamp in the batch
all_root_codes: all event start codes present at the first timestamp in the batch
start_time: start of first event in the batch
fault_end_time: time_on of the last fault event in the batch
down_end_time: the time_on of the last event in the batch, i.e. the last ok_code event in the batch
fault_dur: duration from start of first fault event to start of final fault event in the batch
down_dur: duration of total downtime in the batch, i.e. from start of first fault event to start of last ok_code event
fault_event_ids: indices in the events data of faults that occurred
all_event_ids: indices in the events data of all events (fault or otherwise) that occurred during the batch

wtphm.batch.get_batch_stop_cats(batch_data, event_data, scada_data, grid_col, maint_col, rep_col, grid_cval=0, maint_cval=0, rep_cval=0)¶

Labels the batches with an assumed stop category, based on the stop categories of the root event(s) which triggered them, i.e. the one or more events occurring simultaneously which caused the turbine to stop (items lower down supersede those higher up):

If all root events in the batch are “normal” events, then the batch is labelled normal
Otherwise, label as the most common stop cat in the initial events
If a single sensor category event is present, label sensor
If a single grid category event is present, label grid. Also label grid if the grid counter was active in the scada data. This is a timer indicating how long the turbine was down due to grid issues, used for calculating contract availability
If the maintenance counter was active in the scada data, label maint
There is an additional column labelled “repair”. If the repair counter was active, the turbine was brought down for repairs, and this is given the value “TRUE” for these times.

Parameters:

batch_data (pd.Dataframe) – The batch data
event_data (pd.Dataframe) – The events data
scada_data (pd.Dataframe) – The scada data.
grid_col, maint_col, rep_col (string) – The columns of scada_data which contain availabililty counters for grid issues, turbine maintenance and repairs, resepctively
grid_cval, maint_cval (int) – The minimum total sum of the grid, maintenance and repair counters throughout the duration of a batch for it to be marked as grid, repair or maintenance

Returns:

batch_data_sc (pd.DataFrame) – The original batch_data DataFrame, but with the following headings added:

batch_cat: The stop categories of each batch
repairs: The repair status of each batch

wtphm.batch.get_root_cats(batch_data, event_data)¶

Gets the categories for the root alarms in the batch_data

Parameters:	batch_data (pd.Dataframe) – The batch data event_data (pd.Dataframe) – The events data
Returns:	root_cats (pd.Series) – Series of tuples, where each tuple contains strings of the `stop_cat`s for each of the root alarms in a batch

wtphm.batch.get_most_common_cats(root_cats)¶

Gets the most common root fault category from a dictionary of root alarms

Parameters:	root_cats (pd.Series) – Series of tuples, where each tuple contains strings of the `stop_cat`s for each of the root alarms in a batch
Returns:	most_common_cats (pd.Series) – Each entry in the series is a string containing the most commonly occurring root fault in `cat_counts`. In the case of a draw, then both are added, e.g. ‘test, grid’

wtphm.batch.get_cat_all_ids(root_cats, cat)¶

Get an index of batches where there is only a single certain category present in the categories of the root alarms.

Parameters:	root_cats (pd.Series) – Series of strings, where each string is the categories of each of the root alarms in a batch, separated by commas. cat (string) – The category to check the presence of
Returns:	cat_present_idx (pd.Index) – The index of batch entries where `cat` was the only category present in the `root_cats`

wtphm.batch.get_cat_present_ids(root_cats, cat)¶

Get an index of batches where a certain category is present in the categories of the root alarms.

Parameters:	root_cats (pd.Series) – Series of strings, where each string is the categories of each of the root alarms in a batch, separated by commas. cat (string) – The category to check the presence of
Returns:	cat_present_idx (pd.Index) – The index of batch entries where `cat` was present in the `root_cats`

wtphm.batch.get_counter_active_ids(batch_data, scada_data, counter_col, counter_value=0)¶

Get an index of batches during which a certain scada counter was active

In 10-minute SCADA data there are often counters for when the turbine was in various different states, for calculating contractual availability. This function finds the named counter_col in scada_data, and identifies any sample periods where this value was above counter_value.

If any of these sample periods fall within a certain batch, then this function returns those batch ids.

Parameters:	batch_data (pd.DataFrame) – The batches of events scada_data (pd.DataFrame) – The 10-minute SCADA data counter_col (string) – The column in the SCADA data with a counter counter_value (int) – Any SCADA entries with a counter above this value will have their index returned
Returns:	counter_active_index (pd.Index) – The id’s of `counter_col` columns in `scada_data` which have a val above `counter_value`.