Significant of Gradient Boosting Algorithm in Data Management System

Authors

  • Md Saikat Hosen Capital Normal University
  • Ruhul Amin Bangladesh Bank

DOI:

https://doi.org/10.18034/ei.v9i2.559

Keywords:

Gradient Boosting, Data Science, Data Management System, Boosting Algorithm

Abstract

Gradient boosting machines, the learning process successively fits fresh prototypes to offer a more precise approximation of the response parameter. The principle notion associated with this algorithm is that a fresh base-learner construct to be extremely correlated with the “negative gradient of the loss function” related to the entire ensemble. The loss function's usefulness can be random, nonetheless, for a clearer understanding of this subject, if the “error function is the model squared-error loss”, then the learning process would end up in sequential error-fitting. This study is aimed at delineating the significance of the gradient boosting algorithm in data management systems. The article will dwell much the significance of gradient boosting algorithm in text classification as well as the limitations of this model. The basic methodology as well as the basic-learning algorithm of the gradient boosting algorithms originally formulated by Friedman, is presented in this study. This may serve as an introduction to gradient boosting algorithms. This article has displayed the approach of gradient boosting algorithms. Both the hypothetical system and the plan choices were depicted and outlined. We have examined all the basic stages of planning a specific demonstration for one’s experimental needs. Elucidation issues have been tended to and displayed as a basic portion of the investigation. The capabilities of the gradient boosting algorithms were examined on a set of real-world down-to-earth applications such as text classification.

Downloads

Download data is not yet available.

Author Biographies

Md Saikat Hosen, Capital Normal University

College of Management, Capital Normal University, Haidian District, Beijing, CHINA

Ruhul Amin, Bangladesh Bank

Senior Data Entry Control Operator (IT), ED-Maintenance Office, Bangladesh Bank (Head Office), Dhaka, BANGLADESH

References

Bache, K., and Lichman, M. 2013. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Sciences. Available online at: http://archive.ics.uci.edu/ml/citation_policy.html

Bissacco, A., Yang, M.-H., and Soatto, S. 2007. “Fast human pose estimationusing appearance and motion via multi-dimensional boosting regression,”inIEEE Conference on Computer Vision and Pattern Recognition, CVPR’07. (Minneapolis, MN). doi: 10.1109/CVPR.2007.383129

Breiman, L. 2001. Random forests. Mach. Learn.45, 5–32. doi:10.1023/A:1010933404324

Bullmore, E., and Sporns, O. 2009. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10, 186–198. doi: 10.1038/nrn2575

Chen, H., Tino, P., and Yao, X. 2009. Predictive ensemble pruning by expectation propagation. IEEE Trans. Knowl. Data Eng. 7, 999–1013. doi: 10.1109/TKDE.2009.62

Ciarelli, P., and Oliveira, E. 2009. “Agglomeration and elimination of terms for dimensionality reduction,” in Ninth International Conference on Intelligent Systems Design and Applications, ISDA'09 (Pisa), 547–552. doi: 10.1109/ISDA.2009.9

Ciarelli, P., Salles, E., and Oliveira, E. 2010. “An evolving system based on probabilistic neural network,” in Eleventh Brazilian Symposium on Neural Networks (SBRN) (Sao Paulo), 182–187. doi: 10.1109/SBRN.2010.39

Clemencon, S., and Vayatis, N. 2009. Tree-based ranking methods. IEEE Trans. Inf. Theory 55, 4316–4336. doi: 10.1109/TIT.2009.2025558

Cotter, A., Shamir, O., Srebro, N., and Sridharan, K. 2011. “Better mini-batch algorithms via accelerated gradient methods,” in Advances in Neural Information Processing Systems 24 eds J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger (Cambridge, MA: MIT Press), 1647–1655. Available online at: http://books.nips.cc/papers/files/nips24/NIPS2011_0942.pdf

Du, J., Hu, Y., and Jiang, H. 2011. Boosted mixture learning of Gaussian mixture Hidden Markov models based on maximum likelihood for speech recognition. IEEE Trans. Audio Speech Lang. Process. 19, 2091–2100. doi: 10.1109/TASL.2011.2112352

Friedman, J. 2001. Greedy boosting approximation: a gradient boosting machine.Ann. Stat.29, 1189–1232. doi: 10.1214/aos/1013203451

Friedman, J. 2001. Greedy boosting approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232. doi: 10.1214/aos/1013203451

Friedman, J., Hastie, T., and Tibshirani, R. 2000. Additive logistic regression: a sta-tistical view of boosting.Ann. Stat.28, 337–407. doi: 10.1214/aos/1016218222

Guolin, K., Qi, M., Thomas, F., Taifeng, W., Wei, C., Weidong, M., Qiwei, Y. and Tie-Yan, L. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. Pp. 1-9.

Hansen, L. and Salamon, P. 1990. Neural network ensembles.IEEE Trans. PatternAnal. Mach. Intell.12, 993–1001. doi: 10.1109/34.58871

Hu, T., Li, X., and Zhao, Y. 2007. “Gradient boosting learning of Hidden Markov models,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'06) (Toulouse). doi: 10.1109/ICASSP.2006.1660233

Hu, Y. F. 2005. Efficient and high quality force-directed graph drawing. Math. J. 10, 37–71. Available online at: http://www.mathematica-journal.com/issue/v10i1/graph_draw.html

Johnson, R., and Zhang, T. 2012. Learning Nonlinear Functions Using RegularizedGreedy Forest. Technical Report. arXiv:1109.0887. doi: 10.2172/1052139

Kulkarni, V., and Sinha, P. 2012. “Pruning of random forest classifiers: a survey and future directions,” in International Conference on Data Science Engineering (ICDSE) (Cochin, Kerala), 64–68. doi: 10.1109/ICDSE.2012.6282329

Latora, V., and Marchiori, M. 2001. Efficient behavior of small-world networks. Phys. Rev. Lett. 87:198701. doi: 10.1103/PhysRevLett.87.198701

Liu, Y., Wang, Y., Li, Y., Zhang, B., and Wu, G. (2004). “Earthquake prediction byRBF neural network ensemble,” inAdvances in Neural Networks - ISNN 2004, eds F.-L. Yin, J. Wang, and C. Guo (Berlin; Heidelberg: Springer), 962–969. doi:10.1007/978-3-540-28648-6_153

Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., and Arnaldi, B. 2007. A review of classification algorithms for EEG-based brain-computer interfaces. J. Neural Eng. 4, R1–R13. doi: 10.1088/1741-2560/4/2/R01

Pittman, S. J., and Brown, K. A. 2011. Multi-scale approach for predictingfish species distributions across coral reef seascapes.PLoS ONE6:e20583. doi:10.1371/journal.pone.0020583

Qi, Y. 2012. “Random forest for bioinformatics,” inEnsemble Machine Learning,eds C. Zhang and Y. Ma (New York, NY: Springer), 307. doi: 10.1007/978-1-4419-9326-7_11

Schapire, R. 2002. The boosting approach to machine learning: an overview.Nonlin. Estimat. Classif. Lect. Notes Stat.171, 149–171. doi: 10.1007/978-0-387-21579-2_9

Sewell, M. 2011. Ensemble Learning. Technical Report, Department of ComputerScience, University College London. Available online at: http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/RN_11_02.pdf

Upadhyay, D., Maneroy, J., Zamanz, M. and Sampalli, S. 2020. Gradient Boosting Feature Selection with Machine Learning Classifiers for Intrusion Detection on Power Grids. IEEE Transactions on Network and Service Management, pp. 1-14.

Vassallo, D., Vella, V. and Ellul. 2021. Application of Gradient Boosting Algorithms for Anti‑money Laundering in Cryptocurrencies. SN Computer Science, 2(143): 1-15.

Downloads

Published

2021-07-20

How to Cite

Hosen, M. S., & Amin, R. (2021). Significant of Gradient Boosting Algorithm in Data Management System. Engineering International, 9(2), 85–100. https://doi.org/10.18034/ei.v9i2.559

Issue

Section

Peer Reviewed Articles