Short-term forecasting of individual household electricity loads with investigating impact of data resolution and forecast horizon

Smart grid components such as smart home and battery energy management systems, high penetration of renewable energy systems, and demand response activities, require accurate electricity demand forecasts for the successful operation of the electricity distribution networks. For example, in order to optimize residential PV generation and electricity consumption and plan battery charge-discharge regimes by scheduling household appliances, forecasts need to target and be tailored to individual household electricity loads. The recent uptake of smart meters allows easier access to electricity readings at very fine resolutions; hence, it is possible to utilize this source of available data to create forecast models. In this paper, models which predominantly use smart meter data alongside with weather variables, or smart meter based models (SMBM), are implemented to forecast individual household loads. Well-known machine learning models such as artificial neural networks (ANN), support vector machines (SVM) and Least-Square SVM are implemented within the SMBM framework and their performance is compared. The analysed household stock consists of 14 households from the state of New SouthWales, Australia, with at least a year worth of 5min. resolution data. In order for the results to be comparable between different households, our study first investigates household load profiles according to their volatility and reveals the relationship between load standard deviation and forecast performance. The analysis extends previous research by evaluating forecasts over four different data resolution; 5, 15, 30 and 60min, each resolution analysed for four different horizons; 1, 6, 12 and 24 h ahead. Both, data resolution and forecast horizon, proved to have significant impact on the forecast performance and the obtained results provide important insights for the operation of various smart grid applications. Finally, it is shown that the load profile of some households vary significantly across different days; as a result, providing a single model for the entire period may result in limited performance. By the use of a pre-clustering step, similar daily load profiles are grouped together according to their standard deviation, and instead of applying one SMBM for the entire data-set of a particular household, separate SMBMs are applied to each one of the clusters. This preliminary clustering step increases the complexity of the analysis however it results in significant improvements in forecast performance.


Introduction
Residential buildings make up significant proportion of end-use electricity demand.The U.S. Energy Information Administration projects that around 30% of global electricity end use will be attributed to residential sector by 2020 [1].However, the residential energy sector has seen limited research interest compared to commercial and industrial sectors [2].Amongst the forecast studies which targeted the residential sector, the main focus have been on aggregate regional or district level electricity loads over medium term (a week to a year ahead) to long term (more than a year ahead) horizons.Due to the lack of financial interests, little emphasis has been given to individual household level consumption analysis and forecasts [2,3].On the other hand, with the recent advancement of smart grid technologies, there is a growing interest for understanding residential electricity consumption at individual household level.In particular, the analysis and accurate prediction of electricity loads at individual household level can improve the effectiveness of smart grid applications; this include smart home energy management systems, battery energy management tools, and demand response operations.
At the same time, there has been a great expansion in the deployment of smart meters globally.Many governments mandate utilities to equip their customers with advanced metering infrastructures (AMI) [4], which is a necessary tool to accommodate smart grid technologies and assist networks in responding and shaping future's residential electricity demand.AMI can monitor the load at various data resolutions, and transmit and receive consumption information between the utility companies and households.AMI data enables household consumption analysis at high resolutions which can be effectively used for very short (less than an hour ahead) and short-term (an hour to a week ahead) consumption forecasts.
Individual household electricity load profiles exhibits high volatility.Especially compared to larger scale loads such as commercial buildings, substations or regional electricity loads, load variance and uncertainty is much greater at the household level.This is mainly caused by the unpredictability of occupancy behavior and relatively smaller base loads [5].Therefore, it is a greater challenge to forecast electricity loads at individual household scale.
Previous load forecast studies on individual households which utilized smart meter and weather data within machine learning models, or smart meter based models (SMBM), mainly worked with a particular data resolution; with common forecast horizons chosen to be either one hour or 24 h ahead.Edwards et al. [6] carried hour ahead hourly forecast analysis for three research houses in Tennessee.The houses were unoccupied and fitted with automatically controlled appliances to create simulated occupancy.Authors used artificial neural networks (ANN) and least squares support vector regression (LS-SVR) and the latter method achieved the best overall mean absolute percentage error (MAPE) ranged between 16 and 21% for the three households.Ahmed et al. [7] used ANN to predict day ahead household loads with 15 min resolution.Although the household stock consisted of 28 households, authors presented results for a single household for a limited of 15 days test period where the model achieved MAPE of 13.90%.Rodrigues et al. [8] also used ANN, but they focused on daily maximum loads in addition to the hourly loads.The authors only presented results for two household for a small test period of three days where the MAPE were within 23.5% for the hourly loads.Ghofrani et al. [9] carried forecast analysis for various very short-term horizons; 15, 30 and 60 min-ahead however, the results were reported for only a single test day and the overall MAPE was 12.9%, 18.3% and 30.4% for the three horizons respectively.Gajowniczek et al. [10] forecasted day ahead hourly loads of sample households by using ANN and SVR models.The results were reported for a single household where the MSE was around 0.1 kWh.It is important to note that example load profiles shown in the study had small volatility with load values less than 0.5 kWh.
This study incorporates data from 14 households sourced from Solar Analytics Solar Smart Monitor data base.Households are located in the state of New South Wales (NSW), Australia and have at least one year worth of 5 min resolution data.ANN, support vector regression (SVR), and LS-SVR models, are used as they have previously shown promising results [6,11].
To address the limitations found so far in the literature, our analysis is focused on the following points: in order for SMBM to capture seasonal effects at different scales such as daily, weekly, monthly, and seasonally, households with at least a complete year worth of data were analysed; for results to be more comparable and meaningful, our study first analyses the households according to their load profile volatility.The remainder of this paper is organized as follows: In Section 2, the SMBM is introduced, followed by descriptions of the forecast and clustering methodologies.In the first part of Section 3, results for the initial forecast and statistical analysis are presented.Next, results for the data resolution vs. forecast horizon analysis are shown with discussing the relevant reasons behind the impact of these two parameters on forecast performance.The section is concluded with presenting the results obtained from the combined clustering and forecast method.Finally, concluding remarks are given in Section 4.

Methodology
SMBM use historical loads and weather variables, alongside with temporal variables (hour of day, day of week, holiday, etc.) as inputs for producing forecasts.The desired data resolution can be chosen depending on the available smart meter data, and the desired forecast horizon can be chosen depending on the application.For this study, four different data resolution were used; 5, 15, 30 and 60 min over four different horizon; 1, 6, 12 and 24 h ahead.For this study, historical load and weather variables are selected based on the autocorrelation selection method [12].Depending on the household, chosen resolution, and forecast horizon, the predictor matrix, X, consists of 15-25 variables while target loads are organized in the form of an output vector Y, which are used to train, validate and test the chosen forecast models (ANN, SVR and LS-SVR).All the data analysis and forecasts models are implemented in MATLAB 2016b, using the Statistics and Machine Learning and Neural Network Toolboxes [13].

Artificial neural networks (ANN)
ANN models are well known for their superior capabilities in modeling complex, non-linear relationships between input and output [14] and have previously been used in many load forecasting problems [15].Models used in our study consisted of three layers: a single input, hidden and output layer.In order to choose the optimal number of neurons within the hidden layer, we tried a range of different neurons from 2 to 24 whilst observing the cross validation error.The results show that 12 neurons was the optimal number.For the training, Levenberg-Marquardt Backpropagation method was chosen due to its faster computation time and competitive accuracy over computationally more expensive methods [16].In order to prevent converging to local minima which is identified as one of the inherent problems of ANN, an ensemble of 100 ANNs are used instead of a single ANN, as suggested by [16].

Support vector regression (SVR)
Support vector machines (SVM), found by Vapnik et al. [17] was originally designed to solve classification problems.SVR is a form of SVM to solve numerical regression problems.One of the advantages of SVR over ANN is that the SVR gives a unique solution that converges to a global minimum [18].On the other hand, parameters such as C (cost parameter), e (acceptable error range) and Kernel function parameter; need to be optimized in order to obtain the optimum results which is also known as parameter tuning.Amongst various types, radial basis function kernel is used for our forecast problem due to its common use in the area [11,19] For more detailed information on SVR and tuning parameters please refer to [20].

Least squares support vector regression (LS-SVR)
LS-SVR is an extension of SVR which is capable of linking important frameworks into SVR such as; Gaussian process, Bayesian inference (probabilistic interpretations and inference), and Fisher Discriminant Analysis [21].Unlike SVR, LS-SVR's main criteria function is based on the least squares method.For the implementation of LS-SVR, a library designed to operate in MATLAB environment, LS-SVM lab Toolbox [22] was used.A similar tuning procedure was used for LS-SVR in order to find the optimum parameters (regularization, Kernel function, and cost parameters).For more details on LS-SVR representation and method please refer to [23].

Clustering
The initial forecast analysis showed that there is a significant relationship between the model performance and household load profile volatility, as shown in Figure 1.
Since daily electricity profiles of a household significantly vary over days, using a single model to the entire analysed period may result in limited forecast performance.At this point with the use of an additional clustering step, similar daily load profiles can be grouped according to their volatility and separate models can be applied to each one of these groups.For the clustering of daily profiles, five households are chosen which are representative of distinctive variance, mean and box plot statistics (Fig. 2).For measuring the daily load profile volatility, daily load variance and daily peak index [24] measures were used shown below: Daily Load Variance : DailyPeak Index : where P i represents household load at hour i, P daily max represents daily maximum load and m represents the daily average load.Consequently, for a household with n number of days, the cluster matrix, X cluster consists of n Â 2 variables.
For clustering daily load profiles, a well-known clustering technique, K-means is used.In our study we used the well-known squared Euclidean distance [18].In order to decide for the optimum number of clusters, we used the silhouette method which is a measure of similarity of a point to the other points in its own cluster when compared to the points in other clusters [25].To overcome the potential problems of K-means, such as random assignment and finding local minimum instead of global minimum solution [18], K-means is ran for 100 iterations and the cluster assignment resulted in minimum within-cluster variation is used for following the forecast step.The methodologies of the described four sections can be found in Table 1.

Error metrics
In order to assess and compare the performances of the forecasting models, the following commonly used three error metrics are used: Mean Absolute Error ðMAEÞ : 1 n Mean Bias Error ðMBEÞ : 1 n where y i and ŷi correspond to real and predicted load values respectively.In an individual household forecast analysis, the models may perform relatively well on majority of the points, and fail on some other larger loads (spikes).This can significantly influence the error metrics, and hence may give misleading information about the model performance.Therefore, in order to have a better idea of the model performance we have used additional accuracy metrics shown below [19]: For loads smaller than 500 W : Accuracy small For loadsgreaterthan 500 W : Accuracy large ¼ 0 if jy i Àŷ îj > 0:15 y i 1 if jy i Àŷ îj 0:15 Through these metrics, the model performance can be obtained on large and small-scale loads separately.As a result, the relationship between model performance and different scale loads can be investigated in more detail.

Results and discussion
A summary statistics of the hourly loads of the 14 households are presented in Figure 1 using the Boxplots.It is important to note that household electricity load profiles exhibit significant amount of outliers, shown as red marks.In practice, they correspond to the peak loads (spikes) and occur when electricity intensive appliances are turned-on.
For the initial forecast analysis, a forecast horizon of 24 h is used with hourly resolution data.Figure 2 below shows the RMSE performance of the three models against the household load standard deviation.The plot reveals that increased standard deviation causes an increase in RMSE.This result shows that as the load profiles get more volatile, forecasting becomes more difficult for the models.SVR outperforms the two other models in most households.More detailed results obtained with SVR can be found in Figure 3a and b.
We can see that SVR model shows significantly better accuracy predicting smaller loads which is also observed for the other two models.It is important to note that accuracy metrics depend on the user choice (Sect.2.5) and hence may vary with different threshold values.
Figure 3b shows forecast results obtained from five representative households (ID: 1, 5, 8, 11 and 13) at hourly resolution and 24 h-ahead horizon.The model is generally under predicting, in particular for larger loads the difference between real and predicted loads are more significant.These results once again show the challenges associated with predicting larger consumption due to use of electricity intensive appliances at unprecedented hours.
The impact of resolution and horizon on forecast performance is investigated on the same five households.SVR model outperformed the others for almost all of the resolution and horizons.The RMSE results are shown in Figure 4.It was observed that for the same resolution, errors stay relatively constant amongst three horizons; 6, 12, and 24 h ahead and smallest errors are obtained at hourahead horizon.This can be explained by further investigating the predictor variable selection done by the autocorrelation method.This method aims to select the historical loads which shows the highest correlation with the target forecast load.For example for hourly resolution, the highest correlation is obtained at the 1st lag (previous hour) followed by 24th (previous day) and multiples of 24 up to 168th lag (previous week).The remaining lags between the aforementioned shows less significant  correlation values hence the predictor matrices for horizons; 6, 12 and 24 h includes very similar variables and consequently, obtained results are very similar.On the other hand, 1 h-ahead horizon predictor matrix includes the 1st lag in addition to the other highly correlated lags and hence the obtained results are better than other three horizons.It is important to mention that the predictor selection by autocorrelation is very much dependent on the household consumption profiles and hence highly correlated historical lags may significantly vary between different households.
Another important point to mention is the impact of resolution on forecast performance.The results show that errors increase at higher data resolutions.This outcome is not surprising as all the lower resolutions are obtained by averaging the five minutely raw data.Hence, both the variance and magnitude of peak loads (spikes) are much greater for higher resolution data.This is shown in Figure 5 below, where load profiles of the Household 11 are plotted for the four different resolutions between Jan 01 and Jan 15 2015.
As previously discussed, different households may exhibit different load profiles.Moreover, a particular household's daily electricity load profiles may significantly vary between different days.An example is given in Figure 6 below, where eight daily load profiles are drawn for the Household 8 between January 01 and January 08 2015.It can be clearly seen that daily profiles have distinctive characteristics; load profile shape, mean, peak, time of peak, standard deviation are notably different for each profile.
Machine learning algorithms are expected to capture these daily profile changes with the help of predictor variables such as historical loads, calendar and time information.However, it is not as straight forward to find out how well these changes can be captured with the given inputs, especially when working with machine learning models which are also known as "black box" models.Therefore, this gives the motivation to assist the models by pre grouping similar daily load profiles in the same clusters and further investigating forecast performance by applying a separate model for each cluster.Results of an example clustering assignment is shown in Figure 7 for the Household 1.In this particular example, the optimum number of clusters were found to be four: cluster 1 represents days with minimal morning consumption but more significant after work hours usage; cluster 2 represents working days with morning and evening peaks where the prior is more emphasized; cluster 3 represents minimal household activities when household members are mostly outside the dwelling; cluster 4 represents days with a mid-day peak, which may correspond to weekends with a later start.
All cluster profiles show minimal activity during late night and early morning hours where household members are sleeping.
The improvement of performance obtained by the combining the clustering approach with respect to sole SMBM approach can be found in Figure 8. Considering the superior performance of SVR in the previous analysis, it was selected as the preferred technique for this combined method.
It can be seen that the pre-clustering step improves the forecast performance which is given as RMSE percentage of the average household load.The improvement is as high as 14% for HH11, and 9% for the average household stock.
We need to emphasize that the results obtained in the above figure assumes that each daily profile is modeled by the correct SMBM from the corresponding cluster.In particular, forecasts are carried for daily profiles which are already assigned to the correct cluster.In practice, a daily load profile which is to be forecasted, first needs to be assigned to a cluster and then to be modeled with the corresponding SMBM of the cluster.So far, we have used various calendar information in order to predict the clustering assignment and classify the daily profiles into the respective clusters; however, this approach has not given sufficient classification accuracy.This limitation is aimed to be improved in a future study.

Conclusion
Individual household electricity loads pose great challenges for forecast models.So far, results in the literature lag behind of the results obtained for larger scale loads, which exhibit more predictable and stable load profiles.This is a sign that there can be improvement for predicting these highly volatile profiles.
Our analysis has shown that different households show different load characteristics, which affect the performance of the models.In particular, load volatility and standard deviation are two important parameters.This could indicate that different types of models can be used for different households according to their load profile characteristics.For example, a simple model could be used for a household with a more stable profile, whereas another household with higher volatility may require a more dynamic and responsive model.Results of data resolution versus forecast horizon has shown that the model performance stays relatively the same for horizons other than hour-ahead.This indicates that smart meter technologies at individual household scale may not gain much value from using a horizon shorter than 24 h unless it is hour ahead.These results are representative only for the studied household stock and must be confirmed for a broader range of households.So far, there has been limited amount of studies investigating the importance of load forecast resolution on smart grid technologies.Nonetheless, our analysis confirms that using higher data resolution results in higher errors.
We have also shown that assisting machine-learning algorithms through a pre-clustering step significantly improves the forecast performance.Although the use of  a pre-clustering step increases the complexity of the process, the obtained improvements seem to justify the burden of this step.As previously mentioned, we will be looking for an affective way to assign forecasted daily profiles into the appropriate cluster in a future study.

Fig. 2 .
Fig. 2. Forecast RMSE performance of the models vs. household load standard deviation.

Fig. 3 .
Fig. 3. (a) Accuracy metrics versus standard deviation.(b) Real vs. predicted loads at hourly resolution and 24 h-ahead horizon by SVR for 5 representative households.

Fig. 4 .
Fig. 4. RMSE results of resolution versus horizon analysis on chosen households.

Fig. 5 .
Fig. 5. Load profile of the Household 11 at four different data resolutions between Jan 01 and Jan 15 2015.
This research was funded by the CRC for Low Carbon Living Ltd supported by the Cooperative Research Centers program, an Australian Government initiative.One of the authors (BY) particularly acknowledges scholarship funding provided by the Cooperative Research Centre for Low Carbon Living Ltd and the ongoing support of Solar Analytics Pty Ltd.Solar Analytics, an automated fault monitoring service for rooftop solar energy systems.

Table 1 .
Summary of the methodology used for the forecasting and clustering analysis.Test the models on the independent test set and store the results in the "Predictions" vector Store results separately for each cluster and also for the entire clustered data in Predictions_wCluster 10 Repeat the first 9 steps for each of the 10 folds.Calculate error metrics and compare the results obtained by Predictions_wCluster and Predictions 11 Calculate error metrics by using the "Predictions" vector and the corresponding real load values B. Yildiz et al.: Renew.Energy Environ.Sustain.3, 3 (2018)