Wikipedia:Peer review/Deep reinforcement learning/archive1

Deep reinforcement learning[edit]

Article (edit | visual edit | history) · Article talk (edit | history) · Watch • Watch peer review

This peer review discussion is closed.

I've listed this article for peer review because I've rewritten significant portions and want to confirm that its still suitable and understandable for a general audience.

Thanks, Anair13 (talk) 09:26, 1 December 2020 (UTC)[reply]

Nice job! I was surprised to see you still had an empty talk page; I gave you a belated welcome :)

General comments:

A good rule-of-thumb to look for is at least one explicit citation per paragraph, even for general/basic/introductory material, so readers have a clear pointer to where to look for the source of the information or for more detail. There are some paragraphs here that don't have a citation.
Your first source (Francois-Lavet) is great! But it seems underutilized, only cited once in the lead - this looks like it could fill out citations for some of the other material below.
The history section cites Sutton and Barto 1998 - the 2018 edition is available online here and is good as both a source and a further reading/external link.
Careful about using companies' press releases/blog posts/other non-peer-reviewed material as fact, sometimes these are a little overhyped or promotional. I don't know about DeepMind's data center cooling, but the OpenAI Rubik's Cube announcement got some pushback, for example. I'd at least cite their arxiv preprint, not just the blog post (though it's still worth including for being more accessible).
A few bits of jargon:
- To make this more broadly accessible, I'd reword the jargony "state space" from the lead paragraph, and mention it in the body text instead, or at least wikilink the state space article.
- The term "policy" is jargon here (and is sort of related but distinct from the normal use of the word) - I know it's covered in reinforcement learning too, but I'd give a brief explanation in words here (for a general audience, start with something very basic like "a policy defines an agent's behavior in response to its environment"). For the same reason, I think the "off-policy" section is a little unclear, and you don't mention Q-learning is off-policy.
Minor formatting points:
- Most wiki articles use sentence-case headings, so you don't need to capitalize beyond the first word in the section headers.
- People jump around from section to section reading articles, skim on their phones, etc., so relative-position references like 'described below' and the like aren't usually used in articles - if need be, link to the relevant section, though I think this article is short enough that it's not an issue.
- Citations 20-24 are conference papers that don't have links - these are almost always available online, always good to spare readers a step and provide a link to arxiv or wherever.
In case you're looking for additional material, this one's hot off the press! https://www.nature.com/articles/s41586-020-03051-4 ^[1]

References

^ Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy; Silver, David (24 December 2020). "Mastering Atari, Go, chess and shogi by planning with a learned model". Nature. 588 (7839): 604–609. doi:10.1038/s41586-020-03051-4.

Opabinia regalis (talk) 03:38, 24 December 2020 (UTC)[reply]

OK, thanks a lot for the feedback and the welcome! - I'll incorporate it shortly. Anair13 (talk) 20:44, 26 December 2020 (UTC)[reply]

[1] Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy; Silver, David (24 December 2020). "Mastering Atari, Go, chess and shogi by planning with a learned model". Nature. 588 (7839): 604–609. doi:10.1038/s41586-020-03051-4.

[1]