wtphm.batch

This module contains functions for creating the batch_data.

See more in the Overview.

wtphm.batch.get_grouped_event_data(event_data, code_groups, fault_codes)

Groups together similar event codes as the same code.

This returns the events dataframe but with some fault events which have different but similar codes and descriptions grouped together and relabelled to have the same code and description.

More info in the Group Faults of the Same Type section of the user guide.

Parameters:
  • event_data (pandas.DataFrame) – The original events/fault data.
  • fault_codes (numpy.ndarray) – All event codes that will be treated as fault events for the batches
  • code_groups (list-like, optional, default=None) – The groups of similar events with similar codes/descriptions. Must be a list or list-of-lists, e.g. [[10, 11, 12], [24, 25], [56, 57, 58]] or [10, 11, 12].
Returns:

  • grouped_event_data (pandas.DataFrame) – The event_data, but with codes and descriptions from code_groups changed so that similar ones are identical
  • grouped_fault_codes (pandas.DataFrame) – The fault_codes, but with the similar codes in each group treated as identical

wtphm.batch.get_batch_data(event_data, fault_codes, ok_code, t_sep_lim='12 hour')

Get the distinct batches of events as they appear in the event_data.

Each batch is a group of fault events that occurred during a fault-related shutdown. A batch always begins with a fault event from one of the codes in fault_codes, and ends with the code ok_code, which signifies the turbine returning to normal operation.

More info in can be found in Creating Batches.

Parameters:
  • event_data (pandas.DataFrame) – The original events/fault data.
  • fault_codes (numpy.ndarray) – All event codes that will be treated as fault events for the batches
  • ok_code (int) – A code which signifies the turbine returning to normal operation after being shut down or curtailed due to a fault or otherwise
  • t_sep_lim (str, default=’1 hour’, must be compatible with pd.Timedelta) – If a batch ends, and a second batch begins less than t_sep_lim afterwards, then the two batches are treated as one. It treats the the turbine coming back online and immediately faulting again as one continuous batch. This effect is stacked so that if a third fault event happens less than t_sep_lim after the second, all three are treated as the same continuous batch.
Returns:

batch_data (pd.DataFrame) – DataFrame with the following headings:

  • turbine_num: turbine number of the batch
  • fault_root_codes: the fault codes present at the first timestamp in the batch
  • all_root_codes: all event start codes present at the first timestamp in the batch
  • start_time: start of first event in the batch
  • fault_end_time: time_on of the last fault event in the batch
  • down_end_time: the time_on of the last event in the batch, i.e. the last ok_code event in the batch
  • fault_dur: duration from start of first fault event to start of final fault event in the batch
  • down_dur: duration of total downtime in the batch, i.e. from start of first fault event to start of last ok_code event
  • fault_event_ids: indices in the events data of faults that occurred
  • all_event_ids: indices in the events data of all events (fault or otherwise) that occurred during the batch

wtphm.batch.get_batch_stop_cats(batch_data, event_data, scada_data, grid_col, maint_col, rep_col, grid_cval=0, maint_cval=0, rep_cval=0)

Labels the batches with an assumed stop category, based on the stop categories of the root event(s) which triggered them, i.e. the one or more events occurring simultaneously which caused the turbine to stop (items lower down supersede those higher up):

  • If all root events in the batch are “normal” events, then the batch is labelled normal
  • Otherwise, label as the most common stop cat in the initial events
  • If a single sensor category event is present, label sensor
  • If a single grid category event is present, label grid. Also label grid if the grid counter was active in the scada data. This is a timer indicating how long the turbine was down due to grid issues, used for calculating contract availability
  • If the maintenance counter was active in the scada data, label maint
  • There is an additional column labelled “repair”. If the repair counter was active, the turbine was brought down for repairs, and this is given the value “TRUE” for these times.
Parameters:
  • batch_data (pd.Dataframe) – The batch data
  • event_data (pd.Dataframe) – The events data
  • scada_data (pd.Dataframe) – The scada data.
  • grid_col, maint_col, rep_col (string) – The columns of scada_data which contain availabililty counters for grid issues, turbine maintenance and repairs, resepctively
  • grid_cval, maint_cval (int) – The minimum total sum of the grid, maintenance and repair counters throughout the duration of a batch for it to be marked as grid, repair or maintenance
Returns:

batch_data_sc (pd.DataFrame) – The original batch_data DataFrame, but with the following headings added:

  • batch_cat: The stop categories of each batch
  • repairs: The repair status of each batch

wtphm.batch.get_root_cats(batch_data, event_data)

Gets the categories for the root alarms in the batch_data

Parameters:
  • batch_data (pd.Dataframe) – The batch data
  • event_data (pd.Dataframe) – The events data
Returns:

root_cats (pd.Series) – Series of tuples, where each tuple contains strings of the stop_cats for each of the root alarms in a batch

wtphm.batch.get_most_common_cats(root_cats)

Gets the most common root fault category from a dictionary of root alarms

Parameters:root_cats (pd.Series) – Series of tuples, where each tuple contains strings of the stop_cats for each of the root alarms in a batch
Returns:most_common_cats (pd.Series) – Each entry in the series is a string containing the most commonly occurring root fault in cat_counts. In the case of a draw, then both are added, e.g. ‘test, grid’
wtphm.batch.get_cat_all_ids(root_cats, cat)

Get an index of batches where there is only a single certain category present in the categories of the root alarms.

Parameters:
  • root_cats (pd.Series) – Series of strings, where each string is the categories of each of the root alarms in a batch, separated by commas.
  • cat (string) – The category to check the presence of
Returns:

cat_present_idx (pd.Index) – The index of batch entries where cat was the only category present in the root_cats

wtphm.batch.get_cat_present_ids(root_cats, cat)

Get an index of batches where a certain category is present in the categories of the root alarms.

Parameters:
  • root_cats (pd.Series) – Series of strings, where each string is the categories of each of the root alarms in a batch, separated by commas.
  • cat (string) – The category to check the presence of
Returns:

cat_present_idx (pd.Index) – The index of batch entries where cat was present in the root_cats

wtphm.batch.get_counter_active_ids(batch_data, scada_data, counter_col, counter_value=0)

Get an index of batches during which a certain scada counter was active

In 10-minute SCADA data there are often counters for when the turbine was in various different states, for calculating contractual availability. This function finds the named counter_col in scada_data, and identifies any sample periods where this value was above counter_value.

If any of these sample periods fall within a certain batch, then this function returns those batch ids.

Parameters:
  • batch_data (pd.DataFrame) – The batches of events
  • scada_data (pd.DataFrame) – The 10-minute SCADA data
  • counter_col (string) – The column in the SCADA data with a counter
  • counter_value (int) – Any SCADA entries with a counter above this value will have their index returned
Returns:

counter_active_index (pd.Index) – The id’s of counter_col columns in scada_data which have a val above counter_value.