Significant of Gradient Boosting Algorithm in Data Management System
DOI:
https://doi.org/10.18034/ei.v9i2.559Keywords:
Gradient Boosting, Data Science, Data Management System, Boosting AlgorithmAbstract
Gradient boosting machines, the learning process successively fits fresh prototypes to offer a more precise approximation of the response parameter. The principle notion associated with this algorithm is that a fresh base-learner construct to be extremely correlated with the “negative gradient of the loss function” related to the entire ensemble. The loss function's usefulness can be random, nonetheless, for a clearer understanding of this subject, if the “error function is the model squared-error loss”, then the learning process would end up in sequential error-fitting. This study is aimed at delineating the significance of the gradient boosting algorithm in data management systems. The article will dwell much the significance of gradient boosting algorithm in text classification as well as the limitations of this model. The basic methodology as well as the basic-learning algorithm of the gradient boosting algorithms originally formulated by Friedman, is presented in this study. This may serve as an introduction to gradient boosting algorithms. This article has displayed the approach of gradient boosting algorithms. Both the hypothetical system and the plan choices were depicted and outlined. We have examined all the basic stages of planning a specific demonstration for one’s experimental needs. Elucidation issues have been tended to and displayed as a basic portion of the investigation. The capabilities of the gradient boosting algorithms were examined on a set of real-world down-to-earth applications such as text classification.
Downloads
References
Bache, K., and Lichman, M. 2013. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Sciences. Available online at: http://archive.ics.uci.edu/ml/citation_policy.html
Bissacco, A., Yang, M.-H., and Soatto, S. 2007. “Fast human pose estimationusing appearance and motion via multi-dimensional boosting regression,”inIEEE Conference on Computer Vision and Pattern Recognition, CVPR’07. (Minneapolis, MN). doi: 10.1109/CVPR.2007.383129
Breiman, L. 2001. Random forests. Mach. Learn.45, 5–32. doi:10.1023/A:1010933404324
Bullmore, E., and Sporns, O. 2009. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10, 186–198. doi: 10.1038/nrn2575
Chen, H., Tino, P., and Yao, X. 2009. Predictive ensemble pruning by expectation propagation. IEEE Trans. Knowl. Data Eng. 7, 999–1013. doi: 10.1109/TKDE.2009.62
Ciarelli, P., and Oliveira, E. 2009. “Agglomeration and elimination of terms for dimensionality reduction,” in Ninth International Conference on Intelligent Systems Design and Applications, ISDA'09 (Pisa), 547–552. doi: 10.1109/ISDA.2009.9
Ciarelli, P., Salles, E., and Oliveira, E. 2010. “An evolving system based on probabilistic neural network,” in Eleventh Brazilian Symposium on Neural Networks (SBRN) (Sao Paulo), 182–187. doi: 10.1109/SBRN.2010.39
Clemencon, S., and Vayatis, N. 2009. Tree-based ranking methods. IEEE Trans. Inf. Theory 55, 4316–4336. doi: 10.1109/TIT.2009.2025558
Cotter, A., Shamir, O., Srebro, N., and Sridharan, K. 2011. “Better mini-batch algorithms via accelerated gradient methods,” in Advances in Neural Information Processing Systems 24 eds J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger (Cambridge, MA: MIT Press), 1647–1655. Available online at: http://books.nips.cc/papers/files/nips24/NIPS2011_0942.pdf
Du, J., Hu, Y., and Jiang, H. 2011. Boosted mixture learning of Gaussian mixture Hidden Markov models based on maximum likelihood for speech recognition. IEEE Trans. Audio Speech Lang. Process. 19, 2091–2100. doi: 10.1109/TASL.2011.2112352
Friedman, J. 2001. Greedy boosting approximation: a gradient boosting machine.Ann. Stat.29, 1189–1232. doi: 10.1214/aos/1013203451
Friedman, J. 2001. Greedy boosting approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232. doi: 10.1214/aos/1013203451
Friedman, J., Hastie, T., and Tibshirani, R. 2000. Additive logistic regression: a sta-tistical view of boosting.Ann. Stat.28, 337–407. doi: 10.1214/aos/1016218222
Guolin, K., Qi, M., Thomas, F., Taifeng, W., Wei, C., Weidong, M., Qiwei, Y. and Tie-Yan, L. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. Pp. 1-9.
Hansen, L. and Salamon, P. 1990. Neural network ensembles.IEEE Trans. PatternAnal. Mach. Intell.12, 993–1001. doi: 10.1109/34.58871
Hu, T., Li, X., and Zhao, Y. 2007. “Gradient boosting learning of Hidden Markov models,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'06) (Toulouse). doi: 10.1109/ICASSP.2006.1660233
Hu, Y. F. 2005. Efficient and high quality force-directed graph drawing. Math. J. 10, 37–71. Available online at: http://www.mathematica-journal.com/issue/v10i1/graph_draw.html
Johnson, R., and Zhang, T. 2012. Learning Nonlinear Functions Using RegularizedGreedy Forest. Technical Report. arXiv:1109.0887. doi: 10.2172/1052139
Kulkarni, V., and Sinha, P. 2012. “Pruning of random forest classifiers: a survey and future directions,” in International Conference on Data Science Engineering (ICDSE) (Cochin, Kerala), 64–68. doi: 10.1109/ICDSE.2012.6282329
Latora, V., and Marchiori, M. 2001. Efficient behavior of small-world networks. Phys. Rev. Lett. 87:198701. doi: 10.1103/PhysRevLett.87.198701
Liu, Y., Wang, Y., Li, Y., Zhang, B., and Wu, G. (2004). “Earthquake prediction byRBF neural network ensemble,” inAdvances in Neural Networks - ISNN 2004, eds F.-L. Yin, J. Wang, and C. Guo (Berlin; Heidelberg: Springer), 962–969. doi:10.1007/978-3-540-28648-6_153
Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., and Arnaldi, B. 2007. A review of classification algorithms for EEG-based brain-computer interfaces. J. Neural Eng. 4, R1–R13. doi: 10.1088/1741-2560/4/2/R01
Pittman, S. J., and Brown, K. A. 2011. Multi-scale approach for predictingfish species distributions across coral reef seascapes.PLoS ONE6:e20583. doi:10.1371/journal.pone.0020583
Qi, Y. 2012. “Random forest for bioinformatics,” inEnsemble Machine Learning,eds C. Zhang and Y. Ma (New York, NY: Springer), 307. doi: 10.1007/978-1-4419-9326-7_11
Schapire, R. 2002. The boosting approach to machine learning: an overview.Nonlin. Estimat. Classif. Lect. Notes Stat.171, 149–171. doi: 10.1007/978-0-387-21579-2_9
Sewell, M. 2011. Ensemble Learning. Technical Report, Department of ComputerScience, University College London. Available online at: http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/RN_11_02.pdf
Upadhyay, D., Maneroy, J., Zamanz, M. and Sampalli, S. 2020. Gradient Boosting Feature Selection with Machine Learning Classifiers for Intrusion Detection on Power Grids. IEEE Transactions on Network and Service Management, pp. 1-14.
Vassallo, D., Vella, V. and Ellul. 2021. Application of Gradient Boosting Algorithms for Anti‑money Laundering in Cryptocurrencies. SN Computer Science, 2(143): 1-15.
Downloads
Published
Issue
Section
License
Engineering International is an Open Access journal. Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a CC BY-NC 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of their work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal. We require authors to inform us of any instances of re-publication.