nilmtk package

Subpackages

Submodules

nilmtk.appliance module

class nilmtk.appliance.Appliance(metadata=None)[source]

Bases: nilmtk.hashable.Hashable

Represents an appliance instance.

Attributes

metadata (dict) See here metadata attributes: http://nilm-metadata.readthedocs.org/en/latest/dataset_metadata.html#appliance
appliance_types = {}

Static (AKA class) variable. Maps from appliance_type (string) to a dict describing metadata about each appliance type.

categories()[source]

Return 1D list of category names (strings).

identifier[source]

Return ApplianceID

label(pretty=False)[source]

Return string ‘(<type>, <identifier>)’ e.g. ‘(fridge, 1)’ if pretty=False else if pretty=True then return a string like ‘Fridge’ or ‘Fridge 2’. If type == ‘unknown’ then appends original_name to end of label.

matches(key)[source]
Parameters:

key : dict

Returns:

Bool

True if all key:value pairs in key match appliance.metadata or Appliance.appliance_types[appliance.metadata[‘type’]]. Returns True if key is empty dict.

n_meters[source]

Return number of meters (int) to which this appliance is connected

on_power_threshold()[source]
type[source]

Return deepcopy of dict describing appliance type.

class nilmtk.appliance.ApplianceID

Bases: tuple

instance

Alias for field number 1

type

Alias for field number 0

nilmtk.building module

class nilmtk.building.Building[source]

Bases: nilmtk.hashable.Hashable

Attributes

elec (MeterGroup)
metadata (dict) Metadata just about this building (e.g. geo location etc). See http://nilm-metadata.readthedocs.org/en/latest/dataset_metadata.html#building Has these additional keys: dataset : string
describe(**kwargs)[source]

Returns a Series describing this building.

identifier[source]
import_metadata(store, key, dataset_name)[source]
save(destination, key)[source]
class nilmtk.building.BuildingID

Bases: tuple

dataset

Alias for field number 1

instance

Alias for field number 0

nilmtk.consts module

nilmtk.dataset module

class nilmtk.dataset.DataSet(filename=None, format='HDF')[source]

Bases: object

Attributes

buildings (OrderedDict) Each key is an integer, starting from 1. Each value is a nilmtk.Building object.
store (nilmtk.DataStore)
metadata (dict) Metadata describing the dataset name, authors etc. (Metadata about specific buildings, meters, appliances etc. is stored elsewhere.) See http://nilm-metadata.readthedocs.org/en/latest/dataset_metadata.html#dataset
clear_cache()[source]
describe(**kwargs)[source]

Returns a DataFrame describing this dataset. Each column is a building. Each row is a feature.

elecs()[source]
import_metadata(store)[source]
Parameters:store : nilmtk.DataStore
plot_good_sections(axes=None, label_func=None, gap=0, **kwargs)[source]

Plots all good sections for all buildings.

Parameters:

axes : list of axes or None.

If None then they will be generated.

Returns:

axes : list of axes

plot_mains_power_histograms(axes=None, **kwargs)[source]
save(destination)[source]
set_window(start=None, end=None)[source]

Set the timeframe window on self.store. Used for setting the ‘region of interest’ non-destructively for all processing.

Parameters:start, end : str or pd.Timestamp or datetime or None

nilmtk.docinherit module

doc_inherit decorator

Usage:

class Foo(object):
def foo(self):
“Frobber” pass
class Bar(Foo):

@doc_inherit def foo(self):

pass

Now, Bar.foo.__doc__ == Bar().foo.__doc__ == Foo.foo.__doc__ == “Frobber”

from: http://code.activestate.com/recipes/576862-docstring-inheritance-decorator/

class nilmtk.docinherit.DocInherit(mthd)[source]

Bases: object

Docstring inheriting method descriptor

The class itself is also used as a decorator

get_no_inst(cls)[source]
get_with_inst(obj, cls)[source]
use_parent_doc(func, source)[source]
nilmtk.docinherit.doc_inherit

alias of DocInherit

nilmtk.elecmeter module

class nilmtk.elecmeter.ElecMeter(store=None, metadata=None, meter_id=None)[source]

Bases: nilmtk.hashable.Hashable, nilmtk.electric.Electric

Represents a physical electricity meter.

Attributes

appliances (list of Appliance objects connected immediately downstream) of this meter. Will be [] if no appliances are connected directly to this meter.
store (nilmtk.DataStore)
key (string) key into nilmtk.DataStore to access data.
metadata (dict.) See http://nilm-metadata.readthedocs.org/en/latest/dataset_metadata.html#elecmeter
available_ac_types(physical_quantity)[source]

Finds available alternating current types for a specific physical quantity.

Parameters:physical_quantity : str or list of strings
Returns:list of strings e.g. [‘apparent’, ‘active’]
available_columns()[source]
Returns:list of 2-tuples of strings e.g. [(‘power’, ‘active’)]
available_physical_quantities()[source]
Returns:list of strings e.g. [‘power’, ‘energy’]
building()[source]
clear_cache(verbose=False)[source]

See also

_compute_stat, _get_stat_from_cache_or_compute, key_for_cached_stat, get_cached_stat

dataset()[source]
device[source]
Returns:dict describing the MeterDevice for this meter (sample period etc).
dominant_appliance()[source]

Tries to find the most dominant appliance on this meter, and then returns that appliance object. Will return None if there are no appliances on this meter.

dropout_rate(ignore_gaps=True, **loader_kwargs)[source]
Parameters:

ignore_gaps : bool, default=True

If True then will only calculate dropout rate for good sections.

full_results : bool, default=False

**loader_kwargs : key word arguments for DataStore.load()

Returns:

DropoutRateResults object if full_results is True,

else float

dry_run_metadata()[source]
get_cached_stat(key_for_stat)[source]
Parameters:key_for_stat : str
Returns:pd.DataFrame

See also

_compute_stat, _get_stat_from_cache_or_compute, key_for_cached_stat, clear_cache

get_metadata()[source]
get_source_node(**loader_kwargs)[source]
get_timeframe()[source]
good_sections(**loader_kwargs)[source]
Parameters:

full_results : bool, default=False

**loader_kwargs : key word arguments for DataStore.load()

Returns:

if full_results is True then return nilmtk.stats.GoodSectionsResults

object otherwise return list of TimeFrame objects.

instance()[source]
is_site_meter()[source]
key[source]
key_for_cached_stat(stat_name)[source]
Parameters:stat_name : str
Returns:key : str

See also

clear_cache, _compute_stat, _get_stat_from_cache_or_compute, get_cached_stat

label(pretty=True)[source]

Returns a string describing this meter.

Parameters:

pretty : boolean

If True then just return the type name of the dominant appliance (without the instance number) or metadata[‘name’], with the first letter capitalised.

Returns:

string : A label listing all the appliance types.

load(**kwargs)[source]

Returns a generator of DataFrames loaded from the DataStore.

By default, load will load all available columns from the DataStore. Specific columns can be selected in one or two mutually exclusive ways:

  1. specify a list of column names using the cols parameter.
  2. specify a physical_quantity and/or an ac_type parameter to ask load to automatically select columns.

If ‘resample’ is set to ‘True’ then the default behaviour is for gaps shorter than max_sample_period will be forward filled.

Parameters:

physical_quantity : string or list of strings

e.g. ‘power’ or ‘voltage’ or ‘energy’ or [‘power’, ‘energy’]. If a single string then load columns only for that physical quantity. If a list of strings then load columns for all those physical quantities.

ac_type : string or list of strings, defaults to None

Where ‘ac_type’ is short for ‘alternating current type’. e.g. ‘reactive’ or ‘active’ or ‘apparent’. If set to None then will load all AC types per physical quantity. If set to ‘best’ then load the single best AC type per physical quantity. If set to a single AC type then load just that single AC type per physical quantity, else raise an Exception. If set to a list of AC type strings then will load all those AC types and will raise an Exception if any cannot be found.

cols : list of tuples, using NILMTK’s vocabulary for measurements.

e.g. [(‘power’, ‘active’), (‘voltage’, ‘’), (‘energy’, ‘reactive’)] cols can’t be used if ac_type and/or physical_quantity are set.

sample_period : int, defaults to None

Number of seconds to use as the new sample period for resampling. If None then will use self.sample_period()

resample : boolean, defaults to False

If True then will resample data using sample_period. Defaults to True if sample_period is not None.

resample_kwargs : dict of key word arguments (other than ‘rule’) to

pass to pd.DataFrame.resample(). Defaults to set ‘limit’ to sample_period / max_sample_period and sets ‘fill_method’ to ffill.

preprocessing : list of Node subclass instances

e.g. [Clip()].

**kwargs : any other key word arguments to pass to self.store.load()

Returns:

Always return a generator of DataFrames (even if it only has a single

column).

Raises:

nilmtk.exceptions.MeasurementError if a measurement is specified

which is not available.

classmethod load_meter_devices(store)[source]
matches(key)[source]
Parameters:key : dict
Returns:Bool
meter_devices = {}
name[source]
sample_period()[source]
save(destination, key)[source]

Convert all relevant attributes to a dict to be saved as metadata in destination at location specified by key

total_energy(**loader_kwargs)[source]
Parameters:

full_results : bool, default=False

**loader_kwargs : key word arguments for DataStore.load()

Returns:

if full_results is True then return TotalEnergyResults object

else returns a pd.Series with a row for each AC type.

upstream_meter(raise_warning=True)[source]
Returns:ElecMeterID of upstream meter or None if is site meter.
class nilmtk.elecmeter.ElecMeterID

Bases: tuple

building

Alias for field number 1

dataset

Alias for field number 2

instance

Alias for field number 0

nilmtk.electric module

class nilmtk.electric.Electric[source]

Bases: object

Common implementations of methods shared by ElecMeter and MeterGroup.

activation_series(min_off_duration=None, min_on_duration=None, border=1, on_power_threshold=None, **kwargs)[source]

Returns runs of an appliance.

Most appliances spend a lot of their time off. This function finds periods when the appliance is on.

Parameters:

min_off_duration : int

If min_off_duration > 0 then ignore ‘off’ periods less than min_off_duration seconds of sub-threshold power consumption (e.g. a washing machine might draw no power for a short period while the clothes soak.) Defaults value from metadata or, if metadata absent, defaults to 0.

min_on_duration : int

Any activation lasting less seconds than min_on_duration will be ignored. Defaults value from metadata or, if metadata absent, defaults to 0.

border : int

Number of rows to include before and after the detected activation

on_power_threshold : int or float

Defaults to self.on_power_threshold()

**kwargs : kwargs for self.power_series()

Returns:

list of pd.Series. Each series contains one activation.

activity_histogram(period='D', bin_duration='H', **kwargs)[source]

Return a histogram vector showing when activity occurs.

e.g. to see when, over the course of an average day, activity occurs then use bin_duration=’H’ and period=’D’.

Parameters:

period : str. Pandas period alias.

bin_duration : str. Pandas period alias e.g. ‘H’ = hourly; ‘D’ = daily.

Width of each bin of the histogram. bin_duration must exactly divide the chosen period.

Returns

——-

hist : np.ndarray

length will be period / bin_duration

available_power_ac_types()[source]

Finds available alternating current types from power measurements.

Returns:

list of strings e.g. [‘apparent’, ‘active’]

Note

Deprecated in NILMTK v0.3

available_power_ac_types should not be used. Instead please use available_ac_types(‘power’).

average_energy_per_period(offset_alias='D', use_uptime=True, **load_kwargs)[source]

Calculate the average energy per period. e.g. the average energy per day.

Parameters:

offset_alias : str

A Pandas offset alias. See: pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

use_uptime : bool

Returns:

pd.Series

Keys are AC types. Values are energy in kWh per period.

correlation(other, **load_kwargs)[source]

Finds the correlation between the two ElecMeters. Both the ElecMeters should be perfectly aligned Adapted from: http://www.johndcook.com/blog/2008/11/05/how-to-calculate-pearson-correlation-accurately/

Parameters:other : an ElecMeter or MeterGroup object
Returns:float : [-1, 1]
entropy(k=3, base=2)[source]

This implementation is provided courtesy NPEET toolbox, the authors kindly allowed us to directly use their code. As a courtesy procedure, you may wish to cite their paper, in case you use this function. This fails if there is a large number of records. Need to ask the authors what to do about the same! The classic K-L k-nearest neighbor continuous entropy estimator x should be a list of vectors, e.g. x = [[1.3],[3.7],[5.1],[2.4]] if x is a one-dimensional scalar and we have four samples

load_series(**kwargs)[source]
Parameters:

ac_type : str

physical_quantity : str

We sum across ac_types of this physical quantity.

**kwargs : passed through to load().

Returns:

generator of pd.Series. If a single ac_type is found for the

physical_quantity then the series.name will be a normal tuple.

If more than 1 ac_type is found then the ac_type will be a string

of the ac_types with ‘+’ in between. e.g. ‘active+apparent’.

matches_appliances(key)[source]
Parameters:

key : dict

Returns:

True if all key:value pairs in key match any appliance

in self.appliances.

min_off_duration()[source]
min_on_duration()[source]
mutual_information(other, k=3, base=2)[source]

Mutual information of two ElecMeters x,y should be a list of vectors, e.g. x = [[1.3],[3.7],[5.1],[2.4]] if x is a one-dimensional scalar and we have four samples

Parameters:other : ElecMeter or MeterGroup
on_power_threshold()[source]

Returns the minimum on_power_threshold across all appliances immediately downstream of this meter. If any appliance does not have an on_power_threshold then default to 10 watts.

plot(ax=None, timeframe=None, plot_legend=True, unit='W', plot_kwargs=None, **kwargs)[source]
Parameters:

width : int, optional

Number of points on the x axis required

ax : matplotlib.axes, optional

plot_legend : boolean, optional

Defaults to True. Set to False to not plot legend.

unit : {‘W’, ‘kW’}

**kwargs

plot_activity_histogram(ax=None, period='D', bin_duration='H', plot_kwargs=None, **kwargs)[source]
plot_autocorrelation(ax=None)[source]

Plots autocorrelation of power data Reference: http://www.itl.nist.gov/div898/handbook/eda/section3/autocopl.htm

Returns:matplotlib.axis
plot_lag(lag=1, ax=None)[source]

Plots a lag plot of power data http://www.itl.nist.gov/div898/handbook/eda/section3/lagplot.htm

Returns:matplotlib.axis
plot_power_histogram(ax=None, load_kwargs=None, plot_kwargs=None, range=None, **hist_kwargs)[source]
Parameters:

ax : axes

load_kwargs : dict

plot_kwargs : dict

range : None or tuple

if range=(None, x) then on_power_threshold will be used as minimum.

**hist_kwargs

Returns:

ax

plot_spectrum(ax=None)[source]

Plots spectral plot of power data http://www.itl.nist.gov/div898/handbook/eda/section3/spectrum.htm

Code borrowed from: http://glowingpython.blogspot.com/2011/08/how-to-plot-frequency-spectrum-with.html

Returns:matplotlib.axis
power_series(**kwargs)[source]

Get power Series.

Parameters:

ac_type : str, defaults to ‘best’

**kwargs :

Any other key word arguments are passed to self.load()

Returns:

generator of pd.Series of power measurements.

power_series_all_data(**kwargs)[source]
proportion_of_energy(other, **loader_kwargs)[source]

Compute the proportion of energy of self compared to other.

By default, only uses other.good_sections(). You may want to set sections=self.good_sections().intersection(other.good_sections())

Parameters:

other : nilmtk.MeteGroup or ElecMeter

Typically this will be mains.

Returns:

float [0,1] or NaN if other.total_energy == 0

proportion_of_upstream(**load_kwargs)[source]

Returns a value in the range [0,1] specifying the proportion of the upstream meter’s total energy used by this meter.

switch_times(threshold=40)[source]

Returns an array of pd.DateTime when a switch occurs as defined by threshold

Parameters:

threshold: int, threshold in Watts between succcessive readings

to amount for an appliance state change

uptime(**load_kwargs)[source]
Returns:timedelta: total duration of all good sections.
vampire_power(**load_kwargs)[source]
when_on(on_power_threshold=None, **load_kwargs)[source]

Are the connected appliances appliance is on (True) or off (False)?

Uses self.on_power_threshold() if on_power_threshold not provided.

Parameters:

on_power_threshold : number, optional

Defaults to self.on_power_threshold()

**load_kwargs : key word arguments

Passed to self.power_series()

Returns:

generator of pd.Series

index is the same as for chunk returned by self.power_series() values are booleans

nilmtk.electric.activation_series_for_chunk(chunk, min_off_duration=0, min_on_duration=0, border=1, on_power_threshold=5)[source]

Returns runs of an appliance.

Most appliances spend a lot of their time off. This function finds periods when the appliance is on.

Parameters:

chunk : pd.Series

min_off_duration : int

If min_off_duration > 0 then ignore ‘off’ periods less than min_off_duration seconds of sub-threshold power consumption (e.g. a washing machine might draw no power for a short period while the clothes soak.) Defaults to 0.

min_on_duration : int

Any activation lasting less seconds than min_on_duration will be ignored. Defaults to 0.

border : int

Number of rows to include before and after the detected activation

on_power_threshold : int or float

Defaults to self.on_power_threshold()

Returns:

list of pd.Series. Each series contains one activation.

nilmtk.electric.align_two_meters(master, slave, func='power_series')[source]

Returns a generator of 2-column pd.DataFrames. The first column is from master, the second from slave.

Takes the sample rate and good_periods of master and applies to slave.

Parameters:master, slave : ElecMeter or MeterGroup instances

nilmtk.exceptions module

File defining custom nilmtk exception classes.

exception nilmtk.exceptions.MeasurementError[source]

Bases: exceptions.Exception

exception nilmtk.exceptions.PerformanceWarning[source]

Bases: exceptions.RuntimeWarning

exception nilmtk.exceptions.TooFewSamplesError[source]

Bases: exceptions.Exception

nilmtk.hashable module

class nilmtk.hashable.Hashable[source]

Bases: object

Simple mix-in class to add functions necessary to make an object hashable. Just requires the child class to have an identifier namedtuple.

nilmtk.measurement module

nilmtk.measurement.check_ac_type(ac_type)[source]
nilmtk.measurement.check_physical_quantity(physical_quantity)[source]
nilmtk.measurement.measurement_columns(column_tuples)[source]
Parameters:column_tuples : list of 2-tuples
Returns:pd.MultiIndex
nilmtk.measurement.select_best_ac_type(available_ac_types, mains_ac_types=None)[source]

Selects the ‘best’ alternating current measurement type from available_ac_types.

Parameters:

available_ac_types : list of strings

e.g. [‘active’, ‘reactive’]

mains_ac_types : list of strings, optional

if provided then will try to select the best AC type from available_ac_types which is also in mains_ac_types. If none of the measurements from mains_ac_types are available then will raise a warning and will select another ac type.

Returns:

best_ac_type : string

nilmtk.metergroup module

class nilmtk.metergroup.MeterGroup(meters=None, disabled_meters=None)[source]

Bases: nilmtk.electric.Electric

A group of ElecMeter objects. Can contain nested MeterGroup objects.

Implements many of the same methods as ElecMeter.

Attributes

meters (list of ElecMeters or nested MeterGroups)
disabled_meters (list of ElecMeters or nested MeterGroups)
name (only set by functions like ‘groupby’ and ‘select_top_k’)
all_meters()[source]

Returns a list of self.meters + self.disabled_meters.

appliances[source]
available_ac_types(physical_quantity)[source]

Returns set of all available alternating current types for a specific physical quantity.

Parameters:physical_quantity : str or list of strings
Returns:list of strings e.g. [‘apparent’, ‘active’]
available_physical_quantities()[source]
Returns:list of strings e.g. [‘power’, ‘energy’]
building()[source]

Returns building instance integer(s).

call_method_on_all_meters(method)[source]

Calls method on each element in self.meters.

Parameters:

method : str

Name of a stats method in ElecMeter. e.g. ‘correlation’.

Returns:

pd.Series of result of method called on each element in self.meters.

clear_cache()[source]

Clear cache on all meters in this MeterGroup.

contains_meters_from_multiple_buildings()[source]

Returns True if this MeterGroup contains meters from more than one building.

correlation_of_sum_of_submeters_with_mains(**load_kwargs)[source]
dataframe_of_meters(**kwargs)[source]
Parameters:

sample_period : int or float, optional

Number of seconds to use as sample period when reindexing meters. If not specified then will use the max of all meters’ sample_periods.

resample : bool, defaults to True

If True then resample to sample_period.

**kwargs :

any other key word arguments to pass to self.store.load() including:

ac_type : string, defaults to ‘best’

physical_quantity: string, defaults to ‘power’

Returns:

DataFrame

Each column is a meter.

dataset()[source]

Returns dataset string(s).

describe(compute_expensive_stats=True, **kwargs)[source]

Returns pd.Series describing this MeterGroup.

dominant_appliance()[source]
dominant_appliances()[source]
draw_wiring_graph(show_meter_labels=True)[source]
dropout_rate(**load_kwargs)[source]

Sums together total energy for each meter.

Parameters:

full_results : bool, default=False

**loader_kwargs : key word arguments for DataStore.load()

Returns:

if full_results is True then return TotalEnergyResults object

else return either a single number of, if there are multiple

AC types, then return a pd.Series with a row for each AC type.

energy_per_meter(per_period=None, mains=None, use_meter_labels=False, **load_kwargs)[source]

Returns pd.DataFrame where columns is meter.identifier and each value is total energy. Index is AC types.

Does not care about wiring hierarchy. Does not attempt to ensure all channels share the same time sections.

Parameters:

per_period : None or offset alias

If None then returns absolute energy used per meter. If a Pandas offset alias (e.g. ‘D’ for ‘daily’) then will return the average energy per period.

ac_type : None or str

e.g. ‘active’ or ‘best’. Defaults to ‘best’.

use_meter_labels : bool

If True then columns will be human-friendly meter labels. If False then columns will be ElecMeterIDs or MeterGroupIDs

mains : None or MeterGroup or ElecMeter

If None then will return DataFrame without remainder. If not None then will return a Series including a ‘remainder’ row which will be mains.total_energy() - energy_per_meter.sum() and an attempt will be made to use the correct AC_TYPE.

Returns:

pd.DataFrame if mains is None else a pd.Series

entropy_per_meter()[source]

Finds the entropy of each meter in this MeterGroup.

Returns:pd.Series of entropy
fraction_per_meter(**load_kwargs)[source]

Fraction of energy per meter.

Return pd.Series. Index is meter.instance. Each value is a float in the range [0,1].

from_list(meter_ids)[source]
Parameters:

meter_ids : list or tuple

Each element is an ElecMeterID or a MeterGroupID.

Returns:

MeterGroup

classmethod from_other_metergroup(other, dataset)[source]

Assemble a new meter group using the same meter IDs and nested MeterGroups as other. This is useful for preparing a ground truth metergroup from a meter group of NILM predictions.

Parameters:

other : MeterGroup

dataset : string

The name of the dataset for the ground truth. e.g. ‘REDD’

Returns:

MeterGroup

get_labels(meter_ids, pretty=True)[source]

Create human-readable meter labels.

Parameters:meter_ids : list of ElecMeterIDs (or 3-tuples in same order as ElecMeterID)
Returns:list of strings describing the appliances.
get_timeframe()[source]
Returns:

nilmtk.TimeFrame representing the timeframe which is the union

of all meters in self.meters.

good_sections(**kwargs)[source]

Returns good sections for just the first meter.

TODO: combine good sections from every meter.

groupby(key, use_appliance_metadata=True, **kwargs)[source]

e.g. groupby(‘category’)

Returns:MeterGroup of nested MeterGroups: one per group
identifier[source]

Returns a MeterGroupID.

import_metadata(store, elec_meters, appliances, building_id)[source]
Parameters:

store : nilmtk.DataStore

elec_meters : dict of dicts

metadata for each ElecMeter

appliances : list of dicts

metadata for each Appliance

building_id : BuildingID

instance()[source]

Returns tuple of integers where each int is a meter instance.

is_site_meter()[source]

Returns True if any meters are site meters

label(**kwargs)[source]
Returns:string : A label listing all the appliance types.
load(**kwargs)[source]

Returns a generator of DataFrames loaded from the DataStore.

By default, load will load all available columns from the DataStore. Specific columns can be selected in one or two mutually exclusive ways:

  1. specify a list of column names using the cols parameter.
  2. specify a physical_quantity and/or an ac_type parameter to ask load to automatically select columns.

Each meter in the MeterGroup will first be resampled before being added. The returned DataFrame will include NaNs at timestamps where no meter had a sample (after resampling the meter).

Parameters:

sample_period : int or float, optional

Number of seconds to use as sample period when reindexing meters. If not specified then will use the max of all meters’ sample_periods.

resample_kwargs : dict of key word arguments (other than ‘rule’) to

pass to pd.DataFrame.resample()

chunksize : int, optional

the maximum number of rows per chunk. Note that each chunk is guaranteed to be of length <= chunksize. Each chunk is not guaranteed to be exactly of length == chunksize.

**kwargs :

any other key word arguments to pass to self.store.load() including:

physical_quantity : string or list of strings

e.g. ‘power’ or ‘voltage’ or ‘energy’ or [‘power’, ‘energy’]. If a single string then load columns only for that physical quantity. If a list of strings then load columns for all those physical quantities.

ac_type : string or list of strings, defaults to None

Where ‘ac_type’ is short for ‘alternating current type’. e.g. ‘reactive’ or ‘active’ or ‘apparent’. If set to None then will load all AC types per physical quantity. If set to ‘best’ then load the single best AC type per physical quantity. If set to a single AC type then load just that single AC type per physical quantity, else raise an Exception. If set to a list of AC type strings then will load all those AC types and will raise an Exception if any cannot be found.

cols : list of tuples, using NILMTK’s vocabulary for measurements.

e.g. [(‘power’, ‘active’), (‘voltage’, ‘’), (‘energy’, ‘reactive’)] cols can’t be used if ac_type and/or physical_quantity are set.

preprocessing : list of Node subclass instances

e.g. [Clip()]

Returns:

Always return a generator of DataFrames (even if it only has a single

column).

Note

Different AC types will be treated separately.

mains()[source]
Returns:ElecMeter or MeterGroup or None
matches(key)[source]
meters_directly_downstream_of_mains()[source]

Returns new MeterGroup.

nested_metergroups()[source]
pairwise(method)[source]

Calls method on all pairs in self.meters.

Assumes method is symmetrical.

Parameters:

method : str

Name of a stats method in ElecMeter. e.g. ‘correlation’.

Returns:

pd.DataFrame of the result of method called on each

pair in self.meters.

pairwise_correlation()[source]

Finds the pairwise correlation among different meters in a MeterGroup.

Returns:pd.DataFrame of correlation between pair of ElecMeters.
pairwise_mutual_information()[source]

Finds the pairwise mutual information among different meters in a MeterGroup.

Returns:

pd.DataFrame of mutual information between

pair of ElecMeters.

plot(kind='separate lines', **kwargs)[source]
Parameters:

width : int, optional

Number of points on the x axis required

ax : matplotlib.axes, optional

plot_legend : boolean, optional

Defaults to True. Set to False to not plot legend.

kind : {‘separate lines’, ‘sum’, ‘area’, ‘snakey’, ‘energy bar’}

timeframe : nilmtk.TimeFrame, optional

Defaults to self.get_timeframe()

plot_good_sections(ax=None, label_func='instance', include_disabled_meters=True, load_kwargs=None, **plot_kwargs)[source]
Parameters:

label_func : str or None

e.g. ‘instance’ (default) or ‘label’ if None then no labels will be produced.

include_disabled_meters : bool

plot_multiple(axes, meter_keys, plot_func, kwargs_per_meter=None, pretty_label=True, **kwargs)[source]

Create multiple subplots.

Parameters:

axes : list of matplotlib axes objects.

e.g. created using fix, axes = plt.subplots()

meter_keys : list of keys for identifying ElecMeters or MeterGroups.

e.g. [‘fridge’, ‘kettle’, 4, MeterGroupID, ElecMeterID]. Each element is anything that MeterGroup.__getitem__() accepts.

plot_func : string

Name of function from ElecMeter or Electric or MeterGroup e.g. plot_power_histogram

kwargs_per_meter : dict

Provide key word arguments for the plot_func for each meter. each key is a parameter name for plot_func each value is a list (same length as meters) for specifying a value for this parameter for each meter. e.g. {‘range’: [(0,100), (0,200)]}

pretty_label : bool

**kwargs : any key word arguments to pass the same values to the

plot func for every meter.

Returns:

axes (flattened into a 1D list)

plot_when_on(**load_kwargs)[source]
proportion_of_energy_submetered(**loader_kwargs)[source]
Returns:float [0,1] or NaN if mains total_energy == 0
proportion_of_upstream_total_per_meter(**load_kwargs)[source]
sample_period()[source]

Returns max of all meter sample periods.

select(**kwargs)[source]

Select a group of meters based on meter metadata.

e.g. * select(building=1, sample_period=6) * select(room=’bathroom’)

If multiple criteria are supplied then these are ANDed together.

Returns:new MeterGroup of selected meters.
select_top_k(k=5, by='energy', asc=False, group_remainder=False, **kwargs)[source]

Only select the top K meters, according to energy.

Functions on the entire MeterGroup. So if you mean to select the top K from only the submeters, please do something like this:

elec.submeters().select_top_k()

Parameters:

k : int, optional, defaults to 5

by: string, optional, defaults to energy

Can select top k by:
  • energy
  • entropy

asc: bool, optional, defaults to False

By default top_k is in descending order. To select top_k by ascending order, use asc=True

group_remainder : bool, optional, defaults to False

If True then place all remaining meters into a nested metergroup.

**kwargs : key word arguments to pass to load()

Returns:

MeterGroup

select_using_appliances(**kwargs)[source]

Select a group of meters based on appliance metadata.

e.g. * select(category=’lighting’) * select(type=’fridge’) * select(building=1, category=’lighting’) * select(room=’bathroom’)

If multiple criteria are supplied then these are ANDed together.

Returns:new MeterGroup of selected meters.
simultaneous_switches(threshold=40)[source]
Parameters:

threshold : number, threshold in Watts

Returns:

sim_switches : pd.Series of type {timestamp: number of

simultaneous switches}

Notes

This function assumes that the submeters in this MeterGroup are all aligned. If they are not then you should align the meters, e.g. by using an Apply node with resample.

sort_meters()[source]

Sorts meters by instance.

submeters()[source]

Returns new MeterGroup of all meters except site_meters

total_energy(**load_kwargs)[source]

Sums together total meter_energy for each meter.

Note that this function does not return the total aggregate energy for a building. Instead this function adds up the total energy for all the meters contained in this MeterGroup. If you want the total aggregate energy then please use MeterGroup.mains().total_energy().

Parameters:

full_results : bool, default=False

**loader_kwargs : key word arguments for DataStore.load()

Returns:

if full_results is True then return TotalEnergyResults object

else return a pd.Series with a row for each AC type.

train_test_split(train_fraction=0.5)[source]
Parameters:train_fraction
Returns:split_time: pd.Timestamp where split should happen
union(other)[source]
Returns:

new MeterGroup where its set of meters is the union of

self.meters and other.meters.

upstream_meter()[source]

Returns single upstream meter. Raises RuntimeError if more than 1 upstream meter.

use_alternative_mains()[source]

Swap present mains meter(s) for mains meter(s) in disabled_meters. This is useful if the dataset has multiple, redundant mains meters (e.g. in UK-DALE buildings 1, 2 and 5).

values_for_appliance_metadata_key(key, only_consider_dominant_appliance=True)[source]
Parameters:

key : str

e.g. ‘type’ or ‘categories’ or ‘room’

Returns:

list

wiring_graph()[source]

Returns a networkx.DiGraph of connections between meters.

class nilmtk.metergroup.MeterGroupID

Bases: tuple

meters

Alias for field number 0

nilmtk.metergroup.combine_chunks_from_generators(index, columns, meters, kwargs)[source]

Combines chunks into a single DataFrame.

Adds or averages columns, depending on whether each column is in PHYSICAL_QUANTITIES_TO_AVERAGE.

Returns:DataFrame
nilmtk.metergroup.iterate_through_submeters_of_two_metergroups(master, slave)[source]
Parameters:master, slave : MeterGroup
Returns:list of 2-tuples of the form (master_meter, slave_meter)
nilmtk.metergroup.meter_sorting_key(meter)
nilmtk.metergroup.replace_dataset(identifier, dataset)[source]
Parameters:identifier : ElecMeterID or MeterGroupID
Returns:ElecMeterID or MeterGroupID with dataset replaced with dataset

nilmtk.metrics module

Metrics to compare disaggregation performance against ground truth data.

All metrics functions have the same interface. Each function takes predictions and ground_truth parameters. Both of which are nilmtk.MeterGroup objects. Each function returns one of two types: either a pd.Series or a single float. Most functions return a pd.Series where each index element is a meter instance int or a tuple of ints for MeterGroups.

Notation

Below is the notation used to mathematically define each metric.

\(T\) - number of time slices.

\(t\) - a time slice.

\(N\) - number of appliances.

\(n\) - an appliance.

\(y^{(n)}_t\) - ground truth power of appliance \(n\) in time slice \(t\).

\(\hat{y}^{(n)}_t\) - estimated power of appliance \(n\) in time slice \(t\).

\(x^{(n)}_t\) - ground truth state of appliance \(n\) in time slice \(t\).

\(\hat{x}^{(n)}_t\) - estimated state of appliance \(n\) in time slice \(t\).

Functions

nilmtk.metrics.error_in_assigned_energy(predictions, ground_truth)[source]

Compute error in assigned energy.

\[error^{(n)} = \left | \sum_t y^{(n)}_t - \sum_t \hat{y}^{(n)}_t \right |\]
Parameters:

predictions, ground_truth : nilmtk.MeterGroup

Returns:

errors : pd.Series

Each index is an meter instance int (or tuple for MeterGroups). Each value is the absolute error in assigned energy for that appliance, in kWh.

nilmtk.metrics.f1_score(predictions, ground_truth)[source]

Compute F1 scores.

\[F_{score}^{(n)} = \frac {2 * Precision * Recall} {Precision + Recall}\]
Parameters:

predictions, ground_truth : nilmtk.MeterGroup

Returns:

f1_scores : pd.Series

Each index is an meter instance int (or tuple for MeterGroups). Each value is the F1 score for that appliance. If there are multiple chunks then the value is the weighted mean of the F1 score for each chunk.

nilmtk.metrics.fraction_energy_assigned_correctly(predictions, ground_truth)[source]

Compute fraction of energy assigned correctly

\[fraction = \sum_n min \left ( \frac{\sum_n y}{\sum_{n,t} y}, \frac{\sum_n \hat{y}}{\sum_{n,t} \hat{y}} \right )\]

Ignores distinction between different AC types, instead if there are multiple AC types for each meter then we just take the max value across the AC types.

Parameters:

predictions, ground_truth : nilmtk.MeterGroup

Returns:

fraction : float in the range [0,1]

Fraction of Energy Correctly Assigned.

nilmtk.metrics.mean_normalized_error_power(predictions, ground_truth)[source]

Compute mean normalized error in assigned power

\[error^{(n)} = \frac { \sum_t {\left | y_t^{(n)} - \hat{y}_t^{(n)} \right |} } { \sum_t y_t^{(n)} }\]
Parameters:

predictions, ground_truth : nilmtk.MeterGroup

Returns:

mne : pd.Series

Each index is an meter instance int (or tuple for MeterGroups). Each value is the MNE for that appliance.

nilmtk.metrics.rms_error_power(predictions, ground_truth)[source]

Compute RMS error in assigned power

\[error^{(n)} = \sqrt{ \frac{1}{T} \sum_t{ \left ( y_t - \hat{y}_t \right )^2 } }\]
Parameters:

predictions, ground_truth : nilmtk.MeterGroup

Returns:

error : pd.Series

Each index is an meter instance int (or tuple for MeterGroups). Each value is the RMS error in predicted power for that appliance.

nilmtk.node module

class nilmtk.node.Node(upstream=None, generator=None)[source]

Bases: object

Abstract class defining interface for all Node subclasses, where a ‘node’ is a module which runs pre-processing or statistics (or, later, maybe NILM training or disaggregation).

check_requirements()[source]

Checks that self.upstream.dry_run_metadata satisfies self.requirements.

Raises:UnsatistfiedRequirementsError
dry_run_metadata()[source]

Does a ‘dry run’ so we can validate the full pipeline before loading any data.

Returns:dict : dry run metadata
get_metadata()[source]
postconditions = {}
process()[source]
required_measurements(state)[source]
Returns:Set of measurements that need to be loaded from disk for this node.
requirements = {}
reset()[source]
results_class = None
run()[source]

Pulls data through the pipeline. Useful if we just want to calculate some stats.

exception nilmtk.node.UnsatisfiedRequirementsError[source]

Bases: exceptions.Exception

nilmtk.node.find_unsatisfied_requirements(state, requirements)[source]
Parameters:

state, requirements : dict

If a property is required but the specific value does not matter then use ‘ANY VALUE’ as the value in requirements.

Returns:

list of strings describing (for human consumption) which

conditions are not satisfied. If all conditions are satisfied

then returns an empty list.

nilmtk.plots module

nilmtk.plots.format_axes(ax)[source]
nilmtk.plots.latexify(fig_width=None, fig_height=None, columns=1)[source]

Set up matplotlib’s RC params for LaTeX plotting. Call this before plotting a figure.

Parameters:

fig_width : float, optional, inches

fig_height : float, optional, inches

columns : {1, 2}

nilmtk.plots.plot_pairwise_heatmap(df, labels, edgecolors='w', cmap=<matplotlib.colors.LinearSegmentedColormap object at 0x7fadafc1fed0>, log=False)[source]

Plots a heatmap of a ‘square’ df Rows and columns are same and the values in this dataframe correspond to the computation b/w row,column. This plot can be used for plotting pairwise_correlation or pairwise_mutual_information or any method which works similarly

nilmtk.plots.plot_series(series, ax=None, fig=None, date_format='%d/%m/%y %H:%M:%S', tz_localize=True, **kwargs)[source]

Plot function for series which is about 5 times faster than pd.Series.plot().

Parameters:

series : pd.Series

ax : matplotlib Axes, optional

If not provided then will generate our own axes.

fig : matplotlib Figure

date_format : str, optional, default=’%d/%m/%y %H:%M:%S’

tz_localize : boolean, optional, default is True

if False then display UTC times.

Can also use all **kwargs expected by `ax.plot`

nilmtk.results module

class nilmtk.results.Results[source]

Bases: object

Stats results from each node need to be assigned to a specific class so we know how to combine results from multiple chunks. For example, Energy can be simply summed; while dropout rate should be averaged, and gaps need to be merged across chunk boundaries. Results objects contain a DataFrame, the index of which is the start timestamp for which the results are valid; the first column (‘end’) is the end timestamp for which the results are valid. Other columns are accumulators for the results.

Attributes

_data (DataFrame) Index is period start. Columns are: end and any columns for internal storage of stats.
append(timeframe, new_results)[source]

Append a single result.

Parameters:

timeframe : nilmtk.TimeFrame

new_results : dict

check_for_overlap()[source]
combined()[source]

Return all results from each chunk combined. Either return single float for all periods or a dict where necessary, e.g. if calculating Energy for a meter which records both apparent power and active power then get active power with energyresults.combined[‘active’]

export_to_cache()[source]
Returns:pd.DataFrame

Notes

Objects are converted using DataFrame.convert_objects(). The reason for doing this is to strip out the timezone information from data columns. We have to do this otherwise Pandas complains if we try to put a column with multiple timezones (e.g. Europe/London across a daylight saving boundary).

import_from_cache(cached_stat, sections)[source]
Parameters:

cached_stat : DataFrame of cached data

sections : list of nilmtk.TimeFrame objects

describing the sections we want to load stats for.

per_period()[source]

return a DataFrame. Index is period start. Columns are: end and <stat name>

simple()[source]

Returns the simplest representation of the results.

timeframes()[source]

Returns a list of timeframes covered by this Result.

unify(other)[source]

Take results from another table of data (another physical meter) and merge those results into self. For example, if we have a dual-split mains supply then we want to merge the results from each physical meter. The two sets of results must be for exactly the same timeframes.

Parameters:

other : Results subclass (same class as self).

Results calculated from another table of data.

update(new_result)[source]

Add results from a new chunk.

Parameters:

new_result : Results subclass (same

class as self) from new chunk of data.

nilmtk.timeframe module

class nilmtk.timeframe.TimeFrame(start=None, end=None, tz=None)[source]

Bases: object

A TimeFrame is a single time span or period, e.g. from “2013” to “2014”.

Attributes

_start (pd.Timestamp or None) if None and empty if False then behave as if start is infinitely far into the past
_end (pd.Timestamp or None) if None and empty is False then behave as if end is infinitely far into the future
enabled (boolean) If False then behave as if both _end and _start are None
_empty (boolean) If True then represents an empty time frame
include_end (boolean)
adjacent(other, gap=0)[source]

Returns True if self.start == other.end or visa versa.

Parameters:

gap : float or int

Number of seconds gap allowed.

Notes

Does not yet handle case where self or other is open-ended.

check_for_overlap(other)[source]
check_tz()[source]
clear()[source]
copy_constructor(other)[source]
empty[source]
end[source]
classmethod from_dict(d)[source]
intersection(other)[source]

Returns a new TimeFrame of the intersection between this TimeFrame and other TimeFrame. If the intersect is empty then the returned TimeFrame will have empty == True.

query_terms(variable_name='timeframe')[source]
slice(frame)[source]

Slices frame using self.start and self.end.

Parameters:frame : pd.DataFrame or pd.Series to slice
Returns:frame : sliced frame
split(duration_threshold)[source]

Splits this TimeFrame into smaller adjacent TimeFrames no longer in duration than duration_threshold.

Parameters:duration_threshold : int, seconds
Returns:generator of new TimeFrame objects
start[source]
timedelta[source]
to_dict()[source]
union(other)[source]

Return a single TimeFrame combining self and other.

nilmtk.timeframe.convert_nat_to_none(timestamp)[source]
nilmtk.timeframe.convert_none_to_nat(timestamp)[source]
nilmtk.timeframe.list_of_timeframe_dicts(timeframes)[source]
Parameters:timeframes : list of TimeFrame objects
Returns:list of dicts
nilmtk.timeframe.list_of_timeframes_from_list_of_dicts(dicts)[source]
nilmtk.timeframe.merge_timeframes(timeframes, gap=0)[source]
Parameters:

timeframes : list of TimeFrame objects (must be sorted)

Returns:

merged : list of TimeFrame objects

Where adjacent timeframes have been merged.

nilmtk.timeframe.split_timeframes(timeframes, duration_threshold)[source]
nilmtk.timeframe.timeframe_from_dict(d)[source]

nilmtk.timeframegroup module

class nilmtk.timeframegroup.TimeFrameGroup(timeframes=None)[source]

Bases: list

A collection of nilmtk.TimeFrame objects.

intersection(other)[source]

Returns a new TimeFrameGroup of self masked by other.

Illustrated example:

self.good_sections(): |######—-#####—–######|
other.good_sections(): |—##—####—-##—–###|
intersection(): |—##—–##———–###|
plot(ax=None, y=0, height=1, gap=0.05, color='b', **kwargs)[source]
remove_shorter_than(threshold)[source]

Removes TimeFrames shorter than threshold seconds.

uptime()[source]

Returns total timedelta of all timeframes joined together.

nilmtk.utils module

nilmtk.utils.append_or_extend_list(lst, value)[source]
nilmtk.utils.capitalise_first_letter(string)[source]
nilmtk.utils.capitalise_index(index)[source]
nilmtk.utils.capitalise_legend(ax)[source]
nilmtk.utils.check_directory_exists(d)[source]
nilmtk.utils.container_to_string(container, sep='_')[source]
nilmtk.utils.convert_to_list(list_like)[source]
nilmtk.utils.convert_to_timestamp(t)[source]
Parameters:t : str or pd.Timestamp or datetime or None
Returns:pd.Timestamp or None
nilmtk.utils.dict_to_html(dictionary)[source]
nilmtk.utils.find_nearest(known_array, test_array)[source]

Find closest value in known_array for each element in test_array.

Parameters:

known_array : numpy array

consisting of scalar values only; shape: (m, 1)

test_array : numpy array

consisting of scalar values only; shape: (n, 1)

Returns:

indices : numpy array; shape: (n, 1)

For each value in test_array finds the index of the closest value in known_array.

residuals : numpy array; shape: (n, 1)

For each value in test_array finds the difference from the closest value in known_array.

nilmtk.utils.flatten_2d_list(list2d)[source]
nilmtk.utils.get_datastore(filename, format, mode='a')[source]
Parameters:

filename : string

format : ‘CSV’ or ‘HDF’

mode : ‘a’ (append) or ‘w’ (write), optional

Returns:

metadata : dict

nilmtk.utils.get_index(data)[source]
Parameters:data : pandas.DataFrame or Series or DatetimeIndex
Returns:index : the index for the DataFrame or Series
nilmtk.utils.get_module_directory()[source]
nilmtk.utils.get_tz(df)[source]
nilmtk.utils.index_of_column_name(df, name)[source]
nilmtk.utils.most_common(lst)[source]

Returns the most common entry in lst.

nilmtk.utils.nodes_adjacent_to_root(graph)[source]
nilmtk.utils.normalise_timestamp(timestamp, freq)[source]

Returns the nearest Timestamp to timestamp which would be in the set of timestamps returned by pd.DataFrame.resample(freq=freq)

nilmtk.utils.offset_alias_to_seconds(alias)[source]

Seconds for each period length.

nilmtk.utils.print_dict(dictionary)[source]
nilmtk.utils.print_on_line(*strings)[source]
nilmtk.utils.show_versions()[source]

Prints versions of various dependencies

nilmtk.utils.simplest_type_for(values)[source]
nilmtk.utils.timedelta64_to_secs(timedelta)[source]

Convert timedelta to seconds.

Parameters:timedelta : np.timedelta64
Returns:float : seconds
nilmtk.utils.timestamp_is_naive(timestamp)[source]
Parameters:

timestamp : pd.Timestamp or datetime.datetime

Returns:

True if timestamp is naive (i.e. if it does not have a

timezone associated with it). See:

https://docs.python.org/2/library/datetime.html#available-types

nilmtk.utils.tree_root(graph)[source]

Returns the object that is the root of the tree.

Parameters:graph : networkx.Graph
nilmtk.utils.tz_localize_naive(timestamp, tz)[source]

nilmtk.version module

Module contents

nilmtk.teardown_package()[source]

Nosetests package teardown function (run when tests are done). See http://nose.readthedocs.org/en/latest/writing_tests.html#test-packages

Uses git to reset data_dir after tests have run.