Talk:Backpropagation through time

"hidden state a_0" -- shouldn't this say x_0?

BPTT Disadvantages Accuracy.[edit]

Under the Disadvantages section, the text references "An Application of Non-Linear Programming to Train Recurrent Neural Networks in Time Series Prediction Problems" by M.P. Cuéllar et al. to justify the claim that RNNs are more susceptible to local optima than other models. However, when examining the cited work, the quote in question seems to be:

In the case of Recurrent Neural Networks (D.P. Mandic et al., 2001), there are not as many training algorithms as for feedforward ones. These algorithms (M. Hüsken et al., 2003; R.J. Williams et al., 1989-1990) also share the same disadvantage as those used to train feedforward networks in that they get trapped in local optimal solutions very easily. In fact, this problem is greater in recurrent neural networks. Some evolutionary techniques have been proposed as a good choice to train these kind of networks (Blanco et al., 2001; M.P. Cuéllar et al., 2004), because they can overcome the local optimal solutions, but they have the drawback that the training stage takes too much time.

Note that in the above quote there is no direct citation to support the claim that "this problem is greater in recurrent neural networks". Following the referenced texts (M. Hüsken et al., 2003; R.J. Williams et al., 1989-1990) yields no mention whatsoever of local optima or how likely RNNs may be to become trapped in them. Following the references (Blanco et al., 2001; M.P. Cuéllar et al., 2004) yields mention of local optima, but no specific claims about or evidence to indicate that RNNs are more likely to become trapped in local optima than any other model. The claim in this section does not appear to be backed up by its sources.