The Difficulty of Learning Long-Term Dependencies with Gradient Flow in Recurrent Nets
DOI:
https://doi.org/10.18034/ei.v8i2.570Keywords:
Recurrent Networks, Back-Propagation through Time, Learning Long-Term Dependencies, Gradient FlowAbstract
In theory, recurrent networks (RN) can leverage their feedback connections to store activations as representations of recent input events. The most extensively used methods for learning what to put in short-term memory, on the other hand, take far too long to be practicable or do not work at all, especially when the time lags between inputs and instructor signals are long. They do not provide significant practical advantages over, the backdrop in feedforward networks with limited time windows, despite being theoretically fascinating. The goal of this article is to have a succinct overview of this rapidly evolving topic, with a focus on recent advancements. Also, we examine the asymptotic behavior of error gradients as a function of time lags to provide a hypothetical treatment of this topic. The methodology adopted in the study was to review some scholarly research papers on the subject matter to address the difficulty of learning long-term dependencies with gradient flow in recurrent nets. RNNs are the most general and powerful sequence learning algorithm currently available. Unlike Hidden Markov Models (HMMs), which have proven to be the most successful technique in a variety of sequence processing applications, they are not limited to discrete internal states and can represent continuous, dispersed sequences. As a result, they can address problems that no other method can. Conventional RNNs, on the other hand, are difficult to train due to the problem of vanishing gradients.
Downloads
References
Angeline, P. J., Saunders, G. M. and Pollack, J. P. (1994). An evolutionary algorithm that constructs recurrent neural networks. IEEE Transactions on Neural Networks, 5(1):54 - 65, 1994. DOI: https://doi.org/10.1109/72.265960
Bald, P. and Pineda, F. (1991). Contrastive learning and neural oscillator. Neural Computation, 3, 526 - 545. DOI: https://doi.org/10.1162/neco.1991.3.4.526
Bengio, Y. and Frasconi, P. (1994). Credit assignment through time: Alternatives to backpropagation. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 75{82. San Mateo, CA: Morgan Kaufmann, 1994.
Bynagari, N. B. (2017). Prediction of Human Population Responses to Toxic Compounds by a Collaborative Competition. Asian Journal of Humanity, Art and Literature, 4(2), 147-156. https://doi.org/10.18034/ajhal.v4i2.577 DOI: https://doi.org/10.18034/ajhal.v4i2.577
Bynagari, N. B. (2018). On the ChEMBL Platform, a Large-scale Evaluation of Machine Learning Algorithms for Drug Target Prediction. Asian Journal of Applied Science and Engineering, 7, 53–64. Retrieved from https://upright.pub/index.php/ajase/article/view/31
Bynagari, N. B. (2019). GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Asian Journal of Applied Science and Engineering, 8, 25–34. Retrieved from https://upright.pub/index.php/ajase/article/view/32
Bynagari, N. B., & Amin, R. (2019). Information Acquisition Driven by Reinforcement in Non-Deterministic Environments. American Journal of Trade and Policy, 6(3), 107-112. https://doi.org/10.18034/ajtp.v6i3.569 DOI: https://doi.org/10.18034/ajtp.v6i3.569
Bynagari, N. B., & Fadziso, T. (2018). Theoretical Approaches of Machine Learning to Schizophrenia. Engineering International, 6(2), 155-168. https://doi.org/10.18034/ei.v6i2.568 DOI: https://doi.org/10.18034/ei.v6i2.568
de Vries, B. and Principe, J. C. (1991). A theory for neural networks with time delays. In R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 162 - 168. San Mateo, CA: Morgan Kaufmann.
Doya, K. (1992). Bifurcations in the learning of recurrent neural networks. In Proceedings of 1992 IEEE International Symposium on Circuits and Systems, pp. 2777 - 2780. DOI: https://doi.org/10.1109/ISCAS.1992.230622
Ganapathy, A. (2016). Virtual Reality and Augmented Reality Driven Real Estate World to Buy Properties. Asian Journal of Humanity, Art and Literature, 3(2), 137-146. https://doi.org/10.18034/ajhal.v3i2.567 DOI: https://doi.org/10.18034/ajhal.v3i2.567
Ganapathy, A. (2018). Cascading Cache Layer in Content Management System. Asian Business Review, 8(3), 177-182. https://doi.org/10.18034/abr.v8i3.542 DOI: https://doi.org/10.18034/abr.v8i3.542
Ganapathy, A. (2019a). Image Association to URLs across CMS Websites with Unique Watermark Signatures to Identify Who Owns the Camera. American Journal of Trade and Policy, 6(3), 101-106. https://doi.org/10.18034/ajtp.v6i3.543 DOI: https://doi.org/10.18034/ajtp.v6i3.543
Ganapathy, A. (2019b). Mobile Remote Content Feed Editing in Content Management System. Engineering International, 7(2), 85-94. https://doi.org/10.18034/ei.v7i2.545 DOI: https://doi.org/10.18034/ei.v7i2.545
Ganapathy, A., & Neogy, T. K. (2017). Artificial Intelligence Price Emulator: A Study on Cryptocurrency. Global Disclosure of Economics and Business, 6(2), 115-122. https://doi.org/10.18034/gdeb.v6i2.558 DOI: https://doi.org/10.18034/gdeb.v6i2.558
Lin, T. Horne, B. G., Ti~no, P. and Giles, C. L. (1996). Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions on Neural Networks, 7(6):1329 - 1338, November 1996. DOI: https://doi.org/10.1109/72.548162
Lin, T., Horne, B. G. and Giles, C. L. (1998). How embedded memory in recurrent neural network architectures helps learning long-term temporal dependencies. Neural Networks, 11(5):861 – 868. DOI: https://doi.org/10.1016/S0893-6080(98)00018-5
Mozer, M. C. (1992). Induction of multiscale temporal structure. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 4, pages 275 - 282. San Mateo, CA: Morgan Kaufmann.
Neogy, T. K., & Bynagari, N. B. (2018). Gradient Descent is a Technique for Learning to Learn. Asian Journal of Humanity, Art and Literature, 5(2), 145-156. https://doi.org/10.18034/ajhal.v5i2.578 DOI: https://doi.org/10.18034/ajhal.v5i2.578
Ortega, J. M. and Rheinboldt, W.C. (1970). Iterative Solution of Non-linear Equations in Several Variables and Systems. Academic Press, New York.
Paruchuri, H. (2019). Market Segmentation, Targeting, and Positioning Using Machine Learning. Asian Journal of Applied Science and Engineering, 8(1), 7-14.
Pineda, F. J. (1988). Dynamics and architecture for neural computation. Journal of Complexity, 4:216 - 245. DOI: https://doi.org/10.1016/0885-064X(88)90021-0
Ring, M. B. (1993). Learning sequential tasks by incrementally adding higher orders. In J. D. Cowan S. J. Hanson and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 115{122. Morgan Kaufmann.
Robinson, A. J. and Fallside, F. (1987). The utility-driven dynamic error propagation network. Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering Department, 1987.
Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning internal representations by error propagation. In Parallel Distributed Processing, volume 1, pages 318{362. MIT Press. DOI: https://doi.org/10.21236/ADA164453
Schmidhuber, J. (1992). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234 - 242, DOI: https://doi.org/10.1162/neco.1992.4.2.234
Vadlamudi, S. (2016). What Impact does Internet of Things have on Project Management in Project based Firms?. Asian Business Review, 6(3), 179-186. https://doi.org/10.18034/abr.v6i3.520 DOI: https://doi.org/10.18034/abr.v6i3.520
Vadlamudi, S. (2019). How Artificial Intelligence Improves Agricultural Productivity and Sustainability: A Global Thematic Analysis. Asia Pacific Journal of Energy and Environment, 6(2), 91-100. https://doi.org/10.18034/apjee.v6i2.542 DOI: https://doi.org/10.18034/apjee.v6i2.542
Williams, R. J. and Zipser, D. (1992). Gradient-based learning algorithms for recurrent networks and their computational complexity. In Backpropagation: Theory, Architectures and Applications. Hillsdale, NJ: Erlbaum.
--0--
Downloads
Published
Issue
Section
License
Engineering International is an Open Access journal. Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a CC BY-NC 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of their work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal. We require authors to inform us of any instances of re-publication.