The Difficulty of Learning Long-Term Dependencies with Gradient Flow in Recurrent Nets

Authors

  • Naresh Babu Bynagari Career Soft Solutions Inc

DOI:

https://doi.org/10.18034/ei.v8i2.570

Keywords:

Recurrent Networks, Back-Propagation through Time, Learning Long-Term Dependencies, Gradient Flow

Abstract

In theory, recurrent networks (RN) can leverage their feedback connections to store activations as representations of recent input events. The most extensively used methods for learning what to put in short-term memory, on the other hand, take far too long to be practicable or do not work at all, especially when the time lags between inputs and instructor signals are long. They do not provide significant practical advantages over, the backdrop in feedforward networks with limited time windows, despite being theoretically fascinating. The goal of this article is to have a succinct overview of this rapidly evolving topic, with a focus on recent advancements. Also, we examine the asymptotic behavior of error gradients as a function of time lags to provide a hypothetical treatment of this topic. The methodology adopted in the study was to review some scholarly research papers on the subject matter to address the difficulty of learning long-term dependencies with gradient flow in recurrent nets. RNNs are the most general and powerful sequence learning algorithm currently available. Unlike Hidden Markov Models (HMMs), which have proven to be the most successful technique in a variety of sequence processing applications, they are not limited to discrete internal states and can represent continuous, dispersed sequences. As a result, they can address problems that no other method can.    Conventional RNNs, on the other hand, are difficult to train due to the problem of vanishing gradients.

Downloads

Download data is not yet available.

Author Biography

  • Naresh Babu Bynagari , Career Soft Solutions Inc

    Director of Sales, Career Soft Solutions Inc, 145 Talmadge rd Edison NJ 08817, Middlesex, USA

References

Angeline, P. J., Saunders, G. M. and Pollack, J. P. (1994). An evolutionary algorithm that constructs recurrent neural networks. IEEE Transactions on Neural Networks, 5(1):54 - 65, 1994. DOI: https://doi.org/10.1109/72.265960

Bald, P. and Pineda, F. (1991). Contrastive learning and neural oscillator. Neural Computation, 3, 526 - 545. DOI: https://doi.org/10.1162/neco.1991.3.4.526

Bengio, Y. and Frasconi, P. (1994). Credit assignment through time: Alternatives to backpropagation. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 75{82. San Mateo, CA: Morgan Kaufmann, 1994.

Bynagari, N. B. (2017). Prediction of Human Population Responses to Toxic Compounds by a Collaborative Competition. Asian Journal of Humanity, Art and Literature, 4(2), 147-156. https://doi.org/10.18034/ajhal.v4i2.577 DOI: https://doi.org/10.18034/ajhal.v4i2.577

Bynagari, N. B. (2018). On the ChEMBL Platform, a Large-scale Evaluation of Machine Learning Algorithms for Drug Target Prediction. Asian Journal of Applied Science and Engineering, 7, 53–64. Retrieved from https://upright.pub/index.php/ajase/article/view/31

Bynagari, N. B. (2019). GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Asian Journal of Applied Science and Engineering, 8, 25–34. Retrieved from https://upright.pub/index.php/ajase/article/view/32

Bynagari, N. B., & Amin, R. (2019). Information Acquisition Driven by Reinforcement in Non-Deterministic Environments. American Journal of Trade and Policy, 6(3), 107-112. https://doi.org/10.18034/ajtp.v6i3.569 DOI: https://doi.org/10.18034/ajtp.v6i3.569

Bynagari, N. B., & Fadziso, T. (2018). Theoretical Approaches of Machine Learning to Schizophrenia. Engineering International, 6(2), 155-168. https://doi.org/10.18034/ei.v6i2.568 DOI: https://doi.org/10.18034/ei.v6i2.568

de Vries, B. and Principe, J. C. (1991). A theory for neural networks with time delays. In R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 162 - 168. San Mateo, CA: Morgan Kaufmann.

Doya, K. (1992). Bifurcations in the learning of recurrent neural networks. In Proceedings of 1992 IEEE International Symposium on Circuits and Systems, pp. 2777 - 2780. DOI: https://doi.org/10.1109/ISCAS.1992.230622

Ganapathy, A. (2016). Virtual Reality and Augmented Reality Driven Real Estate World to Buy Properties. Asian Journal of Humanity, Art and Literature, 3(2), 137-146. https://doi.org/10.18034/ajhal.v3i2.567 DOI: https://doi.org/10.18034/ajhal.v3i2.567

Ganapathy, A. (2018). Cascading Cache Layer in Content Management System. Asian Business Review, 8(3), 177-182. https://doi.org/10.18034/abr.v8i3.542 DOI: https://doi.org/10.18034/abr.v8i3.542

Ganapathy, A. (2019a). Image Association to URLs across CMS Websites with Unique Watermark Signatures to Identify Who Owns the Camera. American Journal of Trade and Policy, 6(3), 101-106. https://doi.org/10.18034/ajtp.v6i3.543 DOI: https://doi.org/10.18034/ajtp.v6i3.543

Ganapathy, A. (2019b). Mobile Remote Content Feed Editing in Content Management System. Engineering International, 7(2), 85-94. https://doi.org/10.18034/ei.v7i2.545 DOI: https://doi.org/10.18034/ei.v7i2.545

Ganapathy, A., & Neogy, T. K. (2017). Artificial Intelligence Price Emulator: A Study on Cryptocurrency. Global Disclosure of Economics and Business, 6(2), 115-122. https://doi.org/10.18034/gdeb.v6i2.558 DOI: https://doi.org/10.18034/gdeb.v6i2.558

Lin, T. Horne, B. G., Ti~no, P. and Giles, C. L. (1996). Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions on Neural Networks, 7(6):1329 - 1338, November 1996. DOI: https://doi.org/10.1109/72.548162

Lin, T., Horne, B. G. and Giles, C. L. (1998). How embedded memory in recurrent neural network architectures helps learning long-term temporal dependencies. Neural Networks, 11(5):861 – 868. DOI: https://doi.org/10.1016/S0893-6080(98)00018-5

Mozer, M. C. (1992). Induction of multiscale temporal structure. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 4, pages 275 - 282. San Mateo, CA: Morgan Kaufmann.

Neogy, T. K., & Bynagari, N. B. (2018). Gradient Descent is a Technique for Learning to Learn. Asian Journal of Humanity, Art and Literature, 5(2), 145-156. https://doi.org/10.18034/ajhal.v5i2.578 DOI: https://doi.org/10.18034/ajhal.v5i2.578

Ortega, J. M. and Rheinboldt, W.C. (1970). Iterative Solution of Non-linear Equations in Several Variables and Systems. Academic Press, New York.

Paruchuri, H. (2019). Market Segmentation, Targeting, and Positioning Using Machine Learning. Asian Journal of Applied Science and Engineering, 8(1), 7-14.

Pineda, F. J. (1988). Dynamics and architecture for neural computation. Journal of Complexity, 4:216 - 245. DOI: https://doi.org/10.1016/0885-064X(88)90021-0

Ring, M. B. (1993). Learning sequential tasks by incrementally adding higher orders. In J. D. Cowan S. J. Hanson and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 115{122. Morgan Kaufmann.

Robinson, A. J. and Fallside, F. (1987). The utility-driven dynamic error propagation network. Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering Department, 1987.

Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning internal representations by error propagation. In Parallel Distributed Processing, volume 1, pages 318{362. MIT Press. DOI: https://doi.org/10.21236/ADA164453

Schmidhuber, J. (1992). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234 - 242, DOI: https://doi.org/10.1162/neco.1992.4.2.234

Vadlamudi, S. (2016). What Impact does Internet of Things have on Project Management in Project based Firms?. Asian Business Review, 6(3), 179-186. https://doi.org/10.18034/abr.v6i3.520 DOI: https://doi.org/10.18034/abr.v6i3.520

Vadlamudi, S. (2019). How Artificial Intelligence Improves Agricultural Productivity and Sustainability: A Global Thematic Analysis. Asia Pacific Journal of Energy and Environment, 6(2), 91-100. https://doi.org/10.18034/apjee.v6i2.542 DOI: https://doi.org/10.18034/apjee.v6i2.542

Williams, R. J. and Zipser, D. (1992). Gradient-based learning algorithms for recurrent networks and their computational complexity. In Backpropagation: Theory, Architectures and Applications. Hillsdale, NJ: Erlbaum.

--0--

Downloads

Published

2020-12-22

Issue

Section

Peer Reviewed Articles

How to Cite

The Difficulty of Learning Long-Term Dependencies with Gradient Flow in Recurrent Nets. (2020). Engineering International, 8(2), 127-138. https://doi.org/10.18034/ei.v8i2.570

Similar Articles

1-10 of 47

You may also start an advanced similarity search for this article.