Heuristic Feature Selection for Smart Grid Attack Detection

Using heuristic optimization algorithms for feature selection to improve classification efficiency of cyber-attacks

 

Abstract

Heuristic algorithms such as Genetic Algorithm and Particle Swarm Optimization are used to select an ideal subset of features that can improve the classification accuracy and reduce training time of machine learning classifiers. Three heuristic algorithms are used along with three classifiers and tested in the application of cyber-attack detection in power systems. The tests are performed on three IEEE standard power systems of varying sizes. Results show that genetic algorithms are effective in selecting subsets of features that significantly reduce the training time while maintaining and, in some cases, increasing the test accuracy.

Data

The data used in this experiment is generated using the IEEE 14-bus, IEEE 57-bus, and IEEE 118-bus systems and MATPOWER library. The measurement data consists of power flow of branches and buses which are mapped into the state variables, the voltage bus angles, using the Jacobian matrix. False Data Injection (FDI) attacks are simulated based on the mathematical nature of these attacks as explained in various works of literature (refer to publication). Based on this process, 10,000 instances of measurements are generated as training data with half of them being infected with an FDI attack. Another 1,000 instances are generated as testing data to calculate the classification accuracy of each method.
MATPOWER MATPOWER

Procedure

The experimental process consisted of two main steps. First, the classification algorithms, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Artificial Neural Network (ANN), are cross-validated for varying parameters with all original features of each system. The goal is to obtain optimal parameters for each algorithm to be used for the remainder of the experiment. The second step is testing the three FS techniques, Genetic Algorithm (GA), Binary Particle Swarm Optimization (BPSO), and Binary Cuckoo Search (BCS), with the three classification algorithms using the optimal parameters obtained in the first step.

Results - Part 1: Classifier Cross-Validation

Parameter optimization of each of the supervised learning algorithm is performed through cross-validation of varying parameters with optimal accuracy. The figures show the accuracy of each of the three algorithms with varying parameters on the IEEE 14-bus system. SVM is cross-validated for varying kernel coefficient and penalty parameter, gamma and C respectively, KNN is cross-validated for varying number of neighbours, K, and ANN is cross-validated for varying learning rate, alpha. The data used for this cross-validation consists of all the measurements of the system. Optimal parameters of each learning algorithms are selected based on the maximum accuracy achieved on the IEEE 14-bus system with no FS.

Results - Part 2: Effects of Heuristic Feature Selection

The three feature selection (FS) methods, BCS, BPSO, and GA, are implemented. The resultant subset of features selected by each algorithm are tested with the three classification algorithms, SVM, KNN, and ANN, and their classification accuracy on each of the three IEEE bus systems are recorded in the tables on the right and below.

Results show that SVM and KNN are successful at detecting FDI attacks in all three IEEE bus systems. SVM is the most versatile scoring the highest classification accuracy among all the FS methods and in all three test systems. Furthermore, all three heuristic FS methods proved successful at reducing the number of features. GA produced the most successful results among the three FS methods by achieving the highest classification accuracy with minimal number of features. ANNs with the proposed architecture were unsuccessful at detecting FDI attacks regardless of the FS method.

Table 1: Accuracy of classifiers with each FS method on IEEE 14-bus

FS Method # Features SVM KNN ANN
NO FS 34 90.79% 80.28% 81.78%
BCS 11 90.69% 81.38% 77.08%
BPSO 8 90.19% 81.68% 79.18%
GA 8 90.49% 82.28% 79.28%

Table 2: Accuracy of classifiers with each FS method on IEEE 57-bus

FS Method # Features SVM KNN ANN
NO FS 137 88.29% 83.08% 50.05%
BCS 94 88.59% 84.48% 50.15%
BPSO 130 87.39% 83.58% 48.25%
GA 56 87.39% 85.59% 50.95%

Table 3: Accuracy of classifiers with each FS method on IEEE 118-bus

FS Method # Features SVM KNN ANN
NO FS 304 84.88% 74.57% 53.05%
BCS 199 83.58% 75.48% 51.25%
BPSO 160 83.28% 76.68% 51.95%
GA 122 90.59% 78.18% 50.05%

Conclusion

The inability of the current defence mechanisms to detect FDI attacks calls for alternative methods of detection. In this paper, supervised learning algorithms are implemented and proved to be successful at detecting FDI attacks when tested on the IEEE 14-bus, 57-bus, and 118-bus systems. Furthermore, heuristic FS methods were successful at maintaining, and sometimes increasing, the classification accuracy with significantly lower number of features. SVM and KNN algorithms proved more accurate and versatile among the three systems when compared to the ANN implemented in this paper. However, ANNs with more complex architectures are expected to have better performance on larger systems at a higher computational cost.

FS methods were all successful at increasing accuracy or reducing the number of features, and in some cases both. Classification results conclude that GA is the most efficient heuristic FS method for power systems in terms of accuracy and number of features. SVM with GA proved to be the most accurate and versatile among the three systems.