As a demonstration, let us first load the REDD dataset (which has already been converted to HDF5 format):
proportion_of_energy_submetered reports the proportion of energy in a building that is submetered where 0 = no energy submetered and 1 = all energy submetered:
There are two reasons why data might not be recorded:
nilmtk has a number of functions to help find periods where samples for one or more sensors were not recorded.
By default, plot_missing_samples_using_rectangles plots rectangles indicating the presence of a gap in the data, where a ‘gap’ is defined by the max_sample_period argument. If two consecutive samples are more than max_sample_period apart then that’s a gap! The default is 4 x sample_period. The plot below shows that the two mains channels are inactive for most of the second half of May 2011:
The advantages of plot_missing_samples_using_rectangles are:
The disadvantages are:
To overcome both of these disadvantages, we have a sister function:
Here, the darkness of the blue colour indicates the proportion of samples lost, where dark blue means all samples are lost, light blue means some samples are lost and white means no samples are lost. In comparison to the plot_missing_samples_using_rectangles plot, the plot_missing_samples_using_bitmap function shows us that the circuits in REDD always lose >20% of their samples, but these dropouts are spread evenly.
Let’s get a more precise understanding of the dropout rate of a REDD circuit by getting the dropout rate per day:
And a histogram of power consumption:
So we now know that the oven spends a lot of its time consuming about 2-50 Watts but it appears to be properly ‘on’ when it’s consuming over 1600 watts. So let’s use 1000 watts as the on power threshold.
And some more stats:
And we can plot some histograms to get an understanding of the behaviour of an appliance. Let’s see the usage of the appliance hour-by-hour over an average day:
Not surprisingly, the oven is used most often around lunch and dinner times.
Or the behaviour day-by-day over an average week:
We can see that not much cooking was done in the middle of the week.
Let’s find out length of time that the oven tends to be active for across the dataset.