| Peer-Reviewed

Research on Feature Selection in Power User Identification

Received: 18 April 2018     Accepted: 7 May 2018     Published: 1 June 2018
Views:       Downloads:
Abstract

In the previous study of user identification, most of the researchers improved the recognition algorithm. In this paper, we use large data technology to extract electricity feature from different angles and study the impact of different features on recognition. Firstly, the raw data was cleaned. In order to obtain the key information of power theft user identification, the features of the data set are extracted from three aspects: basic attribute feature, statistical feature under different time scale and similarity feature under different time scale. Then we use feature sets of different combinations to carry out experiments under the KNN model, the random forest (RF) model and the XGBoost model. The experimental results show that the experimental results of the BF+SF+PF feature set in the three classifiers are obviously better than the other two feature sets. Therefore, it is concluded that different features have obvious effects on the recognition results.

Published in Mathematics and Computer Science (Volume 3, Issue 3)
DOI 10.11648/j.mcs.20180303.11
Page(s) 67-76
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2018. Published by Science Publishing Group

Keywords

Feature Selection, Power User Identification, KNN, Random Forest, XG Boost

References
[1] Song Y, Zhou G, Zhu Y. Present Status and Challenges of Big Data Processing in Smart Grid. Power System Technology, 2013, 37(4):927-935.
[2] Tan Z. Design and implementation of online abnormal electricity utilization and risk monitoring system based on electricity behavior analysis. South China University of Technology, 2015.
[3] Chen W, Chen Y, Qiu L, et al. Analysis of anti-stealing electric power based on big data technology. Journal of Electronic Measurement & Instrumentation, 2016.
[4] Zhuang C, Zhang B, Jun H U, et al. Anomaly Detection for Power Consumption Patterns Based on Unsupervised Learning. Proceedings of the Csee, 2016.
[5] Zhou L, Zhao L, Gao W. Application of Sparse Coding in Detection for Abnormal Electricity Consumption Behaviors. Power System Technology, 2015
[6] Monedero I, Biscarri F, León C, et al. Detection of frauds and other non-technical losses in a power utility using Pearson coefficient, Bayesian networks and decision trees. International Journal of Electrical Power & Energy Systems, 2012, 34(1): 90-98.
[7] Jian F J, Cao M, Wang L, et al. SVM Based Energy Consumption Abnormality Detection in AMI System. Electrical Measurement & Instrumentation, 2014.
[8] Chen C, Cook D J. Energy Outlier Detection in Smart Environments// Artificial Intelligence and Smarter Living: the Conquest of Complexity, Papers From the 2011 AAAI Workshop, San Francisco, California, Usa, August. 2011.
[9] Nizar A H, Dong Z Y, Wang Y. Power utility nontechnical loss analysis with extreme learning machine method. IEEE Transactions on Power Systems, 2008, 23(3): 946-955.
[10] Geng Y J, Zhang J Y, Yuan X G. A feature relevance measure based on sparse representation coefficient. Pa- ttern Recognition & Artificial Intelli- gence, 2013, 26(1):106-113.
[11] Zhang Y, Shang C. Combining Newton interpolation and deep learning for image classification. Electronics Letters, 2015, 51(1):40-42.
[12] Kong Y H, Jing M L. Research of the Classification Method Based on Confu- sion Matrixes and Ensemble Learning. Computer Engineering & Science, 2012, 34(6):111-117.
[13] Song Y F, Wang X D, Lei L. Evaluating evidence reliability based on confusion matrix. XI Tong Gong Cheng Yu Dian Zi Ji Shu/systems Engineering & Elec- tronics, 2015, 37(4):974-978.
[14] Huang Y A, You Z H, Gao X, et al. Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence. Biomed Research International, 2015, 2015:902198.
[15] Song H L, He J, Huang P X, et al. Application of parametric method and non-parametric method in estimation of area under ROC curve. Academic Journal of Second Military Medical University, 2006, 27(7):726-728.
[16] Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters, 2006, 27(8):861-874.
[17] Chen T, Guestrin C. XGBoost:A Scalable Tree Boosting System// ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016:785-794.
[18] Zhang L, Zhan C. Machine Learning in Rock Facies Classification: An Applic- ation of XGBoost// International Geophysical Conference, Qingdao, China, 17-20 April. 2017: 1371-1374.
Cite This Article
  • APA Style

    Qiu Yanhao, Song Xiaoyu, Sun Xiangyang, Zhao Yang. (2018). Research on Feature Selection in Power User Identification. Mathematics and Computer Science, 3(3), 67-76. https://doi.org/10.11648/j.mcs.20180303.11

    Copy | Download

    ACS Style

    Qiu Yanhao; Song Xiaoyu; Sun Xiangyang; Zhao Yang. Research on Feature Selection in Power User Identification. Math. Comput. Sci. 2018, 3(3), 67-76. doi: 10.11648/j.mcs.20180303.11

    Copy | Download

    AMA Style

    Qiu Yanhao, Song Xiaoyu, Sun Xiangyang, Zhao Yang. Research on Feature Selection in Power User Identification. Math Comput Sci. 2018;3(3):67-76. doi: 10.11648/j.mcs.20180303.11

    Copy | Download

  • @article{10.11648/j.mcs.20180303.11,
      author = {Qiu Yanhao and Song Xiaoyu and Sun Xiangyang and Zhao Yang},
      title = {Research on Feature Selection in Power User Identification},
      journal = {Mathematics and Computer Science},
      volume = {3},
      number = {3},
      pages = {67-76},
      doi = {10.11648/j.mcs.20180303.11},
      url = {https://doi.org/10.11648/j.mcs.20180303.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mcs.20180303.11},
      abstract = {In the previous study of user identification, most of the researchers improved the recognition algorithm. In this paper, we use large data technology to extract electricity feature from different angles and study the impact of different features on recognition. Firstly, the raw data was cleaned. In order to obtain the key information of power theft user identification, the features of the data set are extracted from three aspects: basic attribute feature, statistical feature under different time scale and similarity feature under different time scale. Then we use feature sets of different combinations to carry out experiments under the KNN model, the random forest (RF) model and the XGBoost model. The experimental results show that the experimental results of the BF+SF+PF feature set in the three classifiers are obviously better than the other two feature sets. Therefore, it is concluded that different features have obvious effects on the recognition results.},
     year = {2018}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Research on Feature Selection in Power User Identification
    AU  - Qiu Yanhao
    AU  - Song Xiaoyu
    AU  - Sun Xiangyang
    AU  - Zhao Yang
    Y1  - 2018/06/01
    PY  - 2018
    N1  - https://doi.org/10.11648/j.mcs.20180303.11
    DO  - 10.11648/j.mcs.20180303.11
    T2  - Mathematics and Computer Science
    JF  - Mathematics and Computer Science
    JO  - Mathematics and Computer Science
    SP  - 67
    EP  - 76
    PB  - Science Publishing Group
    SN  - 2575-6028
    UR  - https://doi.org/10.11648/j.mcs.20180303.11
    AB  - In the previous study of user identification, most of the researchers improved the recognition algorithm. In this paper, we use large data technology to extract electricity feature from different angles and study the impact of different features on recognition. Firstly, the raw data was cleaned. In order to obtain the key information of power theft user identification, the features of the data set are extracted from three aspects: basic attribute feature, statistical feature under different time scale and similarity feature under different time scale. Then we use feature sets of different combinations to carry out experiments under the KNN model, the random forest (RF) model and the XGBoost model. The experimental results show that the experimental results of the BF+SF+PF feature set in the three classifiers are obviously better than the other two feature sets. Therefore, it is concluded that different features have obvious effects on the recognition results.
    VL  - 3
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • College of Engineering, Virginia Polytechnic Institute and State University, Virginia, The United States

  • School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou, China

  • School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou, China

  • School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou, China

  • Sections