Help:Using the Wayback Machine: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Removed references to robots.txt files
Line 63: Line 63:
== Limitations ==
== Limitations ==
Before October 2013 it would often take weeks or months for an archived copy of a web page to become available. Nowadays, a request to archive a particular web page is actioned immediately and the result usually made available within minutes.
Before October 2013 it would often take weeks or months for an archived copy of a web page to become available. Nowadays, a request to archive a particular web page is actioned immediately and the result usually made available within minutes.

The Internet Archive honors the [[robots exclusion standard]]. It will not archive sites that disallow access, and it will remove access to previous versions of a disallowed page.

For example, ''[[The New York Times]]'' has a robots.txt page at http://www.nytimes.com/robots.txt which includes:
User-agent: *
Disallow: /aponline/
Disallow: /archives/
Disallow: /reuters/
Thus, archive requests for URLs within those folders, and any other similarly listed folder of the New York Times website will be rejected.

''[[The Washington Post]]'' uses the file http://www.washingtonpost.com/robots.txt which includes:
User-agent: ia_archiver
Disallow: /
This directive explicitly blocks the Internet Archive from accessing their entire website.


== JavaScript bookmarklet ==
== JavaScript bookmarklet ==

Revision as of 00:39, 15 February 2017

This page gives information about using the Wayback Machine to cite archived copies of web pages used by articles. This is useful if a web page has changed, moved, or disappeared; links to the original content can be retained.

Editors are also encouraged to add an archive link as a part of each citation, or at least submit the referenced URL for archiving, at the same time that each citation is created or updated.

Visit the web form at https://archive.org/, enter the original URL of the web page of interest in the "Wayback Machine" search box and then select BROWSE HISTORY. The next screen may

  • redirect to the latest archived copy,
  • show a box near the bottom of the page with a link inviting the user to Save this url in the Wayback Machine,
  • show a calendar listing the snapshot dates for all archived copies of that page, or
  • show an error message explaining why the page cannot be archived.

In short, this is the code that needs to be added to a reference:

<ref>{{<!-EXISTING REFERENCE->|archive-url=https://web.archive.org/web/20021128120000/http://www.originalurl.com|archive-date=2002-11-28|access-date={{subst:YYYYMMDD|d}}|dead-url=yes}}</ref>

URL formats

A link to the Wayback Machine usually starts with https://web.archive.org/web/ followed either by a single asterisk or a 14-digit datetime reference, then a slash and finally the URL of the original web page.

Initial request

The following example usually shows a calendar linking to all archived copies of the main index page of Wikipedia.

Use the above URL format to discover the extent to which the requested page has been archived. Click one of the highlighted dates to select that specific archived copy.

It is possible to narrow down the request by providing an date code with less than 14 digits followed by * (in this example, display only archived snapshots matching December 2005)

If the target web page hasn't yet been archived, a box appears near the bottom of the page with a link inviting the user to Save this url in the Wayback Machine. Clicking this invokes a request to

The above URL will show the current version of the requested web page and start the process that will attempt to archive the web page. If successful, the archived copy will become available immediately the process is completed.

For some requested pages, the Wayback Machine will return an error message explaining why that particular page has not and cannot be archived. In those cases, try a different archiving service such as WebCite.

Specific archive copy

Once the target web page has been archived, each of the specific dated archives can be individually requested using the format shown below.

The next example links to the archived copy of the main index page of Wikipedia exactly as it appeared on 30 September 2002 at 12:35:25 pm in the UTC timezone. The datetime format is YYYYMMDDhhmmss.

Use the above format to link directly to a specific archive copy.

Adding an asterisk immediately after the date (or in place of it) is a quick way to show the calendar view of all archived copies.

The following flags can be appended to the datetime field to modify the format in which the archived content is displayed[1][2]:

  • id_ Identity - perform no alterations of the original resource, return it as it was archived.
  • js_ JavaScript - return document marked up as JavaScript.
  • cs_ CSS - return document marked up as CSS.
  • im_ Image - return document as an image.

Depending on the circumstances under which the page images were archived, the rendering of these pages may not be consistent; therefore, it is recommended that the flags be tested before being incorporated into Wikipedia documents. When linking to pages which are no longer available, the id_ flag is the most transparent in presenting the intent of the original page, as the following example demonstrates for the Wikipedia page as it appeared on 30 September 2002 at 12:35:25 pm in the UTC timezone, without the Wayback Machine Toolbar being displayed. The datetime format is YYYYMMDDhhmmss with id_ appended.

Use the above format to link directly to a specific archive copy without the display of the Wayback Machine Toolbar.

Latest archive copy

The next example links to the most current version of the archived page.

Using the above format is discouraged. The request is redirected to the longform URL, including 14-digit datetime stamp, for the latest archive copy thereby defeating the purpose of using the archive to link directly to a specific old version of the page.

Likewise, a similar archive URL but with the number 1000 links to the oldest archive copy.

See also: Advanced URL locator hints and tips – Internet Archive

Limitations

Before October 2013 it would often take weeks or months for an archived copy of a web page to become available. Nowadays, a request to archive a particular web page is actioned immediately and the result usually made available within minutes.

JavaScript bookmarklet

A bookmarklet is a one-click button in a web browser that is stored like a bookmark but uses javascript to carry out certain actions. To use one when you're at a dead link web page and want to visit archives saved by the Wayback Machine, click and drag the following code to your browser's bookmarks toolbar, then name it something memorable, such as Wayback (e.g. Wayback):

javascript:void(window.open('https://web.archive.org/web/*/'+location.href));

To see a dead page

Then, when you are at a dead page, you may click the bookmarklet and it will automatically take you to the Wayback Machine's archives of that page.

The preceding code may not work for all users. In that case, you may try the following bookmarklet:

javascript:location.href='https://web.archive.org/web/*/'+document.location.href;

To save a live page

For a bookmarklet that allows you to manually archive a page you are visiting, store the following code in a bookmark on your browser's toolbar, with a name such as Wayback Save (e.g. Wayback Save):

javascript:void(window.open('https://web.archive.org/save/'+location.href));

Mozilla Firefox Add-on

If you are using a Mozilla Firefox browser, you can install a 404 error add-on which will automatically try to detect a missing page in Wayback Machine, and provides a button similar to the one described above.

NOTE: As of May 5, 2013 or earlier, this add-on was not working properly. The developer writes (on the add-on page, under "About this Add-on"):

Several Firefox updates have broken one by one all of this plugin's functionalities.
If even the basic functionality is not working for you, here's a temporary fix: [...]

Alternative: Firefox add-on: Resurrect Pages, https://addons.mozilla.org/en-US/firefox/addon/resurrect-pages/

Using the wayback template

{{webarchive}} can create these links for you; use the |url=, |title= and |date= parameters to specify the URL, title and date. For example:

  • {{webarchive |url=https://web.archive.org/web/20010727112808/http://www.wikipedia.org/ |date=July 27, 2001 |title=Wikipedia }}
    Wikipedia at the Wayback Machine (archived July 27, 2001)

Without the date included:

  • {{webarchive |url=https://web.archive.org/web/*/http://www.wikipedia.org/ |date=* |title=Wikipedia }}
    Wikipedia at the Wayback Machine (archive index)

Note that the date parameter defaults to *

Working with cite templates

{{citation}}, and all of the Citation Style 1 templates support the |archiveurl= parameter (Note that the |archive-date= parameter is also required). Other citation templates may also support |archive-url= — see their documentation.

  • {{citation |url=http://www.wikipedia.org/ |title=Wikipedia Main Page |archive-url=https://web.archive.org/web/20020930123525/http://www.wikipedia.org/ |archive-date=2002-09-30 |access-date=2005-07-06 }}
    "Wikipedia Main Page". Archived from the original on 2002-09-30. Retrieved 2005-07-06.
  • Where an archived resource notes its original publication date, use |date= in place of |access-date=.
  • When adding an archive URL to any citation where the original resource URL is still working, it is useful to add the |dead-url=no parameter. Should the original URL stop working, it is a simple job to either change this to |dead-url=yes or remove the parameter. With |dead-url=no, clicking the title in the footnote invokes the original (live) URL, clicking "Archived" gives the archived copy. Otherwise the title invokes the archived page, "Original" invokes the (dead unless it has been reinstated) original link.
    "Wikipedia Main Page". Archived from the original on 2002-09-30. Retrieved 2005-07-06. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)

See also

References

  1. ^ "Wayback Administrator Manual". Internet Archive. Archived from the original on 2014-01-20. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  2. ^ "How can I view a page without the Wayback code in it?". Internet Archive. Archived from the original on 2013-08-06. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)