Help:Using WebCite: Difference between revisions

Content deleted Content added
See also: drop useboxes
 
(48 intermediate revisions by 23 users not shown)
Line 1:
{{ambox|type=content|text='''As of July 14, 2019, WebCite does not accept any new archive requests; previously archived pages can still be accessed, but this service cannot be used to make any new archives.'''}}
https://www.alldoo.com
{{Wikipedia how to|WP:WEBCITE}}
<br>
{{ambox|type=content|text='''As of July 14, 2019, this WebCite does not accept any new archive requests; historically archived pages can still be accessed, but this service cannot be used to make any new archives.'''}}
 
[[WebCite]] is an intermittently available [[web archiving]] service located at [https://www.webcitation.org/ https://www.webcitation.org/]. The archive no longer accepts new snapshots, and usage on English Wikipedia has been deprecated ([[Wikipedia:Village_pump_(proposals)/Archive_159#RfC:_Deprecate_webcitation.org_aka_WebCite|RfC]]) ie. without good reason you should not add new archives into Enwiki, and you should try to move existing snapshots to other archive providers, or refactor the citation to a different live link. WebCite's future is uncertain, and its reliability is poor. For example it was offline for 1 year and 8 months during the period 2021-2023. Outages are tracked on the talk page of [[WebCite]].
[[WebCite]] is an on-demand [[web archiving]] service located at [https://www.webcitation.org/ https://www.webcitation.org/]. By using [[WebCite]], Wikipedia editors can reduce [[link rot]] by preserving a copy of an online [[WP:RS|source]] that can be accessed if the original page is moved, changes, or disappears. Not all web pages can be archived, however.<ref name="wcfaq" group="nb">[https://www.webcitation.org/faq WebCite FAQ:] A page may not be archived for a number of reasons. The page owner may specifically prohibit archiving of their content through no-cache / no-archive tags, or via a robot exclusion policy on their site. The content may be inaccessible from the WebCite® network (this is particularly likely if you are attempting to access subscription based content which your institution subscribes to on its users' behalf). Also, the content may be unreadable by the WebCite® archiver (complex JavaScript based pages, or ones involving browser checks sometimes cause our archive engine to fail).</ref>
 
==Long-form URLs==
WebCite can archive a range of content, including [[HTML]] web pages, [[PDF]] files, [[Style sheet (web development)|style sheets]], [[JavaScript]], and [[digital image]]s. Another web archiving service is the [[Wikipedia:Using the Wayback Machine|Wayback Machine]]. The two operate differently, and certain pages can be archived by one but not the other. The Wayback Machine takes snapshots of webpages at certain times as well as having an archiving process initiated by user requests; WebCite requires someone to actively archive a link.
Links archived with [[WebCite]] should appear in long format (see [[Wikipedia_talk:Using_archive.is#RfC:_Should_we_use_short_or_long_format_URLs.3F|RfC]]).
 
An example long format URL:
==How to archive==
There are many ways to submit a web page to WebCite for archiving. If you are new to using WebCite, give the Website form method a go first. The other methods are better suited to those who use WebCite regularly.
 
:<code><nowiki>https://www.webcitation.org/5eWaHRbn4?url=http://www.example.com/</nowiki></code>
https://www.alldoo.com
===Website form===
This method is easy to use but is slower than the other methods as it requires going to the WebCite website each time you want to archive a web page.
 
The 9-digit "Snapshot ID," similar to [[URL shortening]] services, contains a base 62 coded timestamp that can be extracted by bots and other programs. It also serves as a unique page ID. This is followed by the original URL which helps protect against malicious code that is hiding an inappropriate link, such as spam.
# Go to <code>[https://www.webcitation.org/archive https://www.webcitation.org/archive]</code>.
# Enter the URL of the web page you wish to archive into the "URL to Archive [url]" field.
# Enter an email address into the "Your (citing author) E-mail Address [email]" field.
# After entering the URL of the page you wish to archive and an email address into the form, click the "Submit" button. You will be sent to a page containing a link to the archive URL of the web page you wished to archive.
# An email stating whether the archive process succeeded or failed will be sent to your email address. If it was successful, the archive URL will also be included in the email.
# It is recommended that you view the archived page to check if the archive process has been successful.
 
This archive URL can be inserted into the <code>archive-url=</code> and its supporting <code>archive-date=</code> and <code>url-status=</code> parameters in any of the [[Wikipedia:Citation templates|citation templates]]. If the original URL is [[Wikipedia:Link rot|no longer accessible]], the <code>url-status</code> parameter value should be set to <code>dead</code>. If the original URL is still accessible, the <code>url-status</code> parameter value should be set to <code>live</code>.
===Bookmarklet===
Put simply, a [[bookmarklet]] is a web browser bookmark which instead of going to a web page, performs a certain function. With the WebCite bookmarklet, you click the bookmark, it takes the URL of the page you are currently looking at and submits it to WebCite for archiving. This method is easy to set up, easy to use and is fast. To get the most out of this method, it is recommended that you have your Bookmarks/Favorites bar visible or at least have your bookmarks accessible within a click or two. This method only allows you to archive the page you are currently looking at, to archive a different web page you will have to use another method.
 
<code><nowiki><ref>{{cite web |last= |first= |title= |work= |publisher= |date= |url= |archive-url= |archive-date= |url-status= }}</ref></nowiki></code>.
# To '''set up''' the bookmarklet, go to <code>[https://www.webcitation.org/bookmarklet https://www.webcitation.org/bookmarklet]</code>.
# Enter an email address. An email stating whether the archive process succeeded or failed will be sent to this address. If it was successful, the archive URL will also be included in the email.
# Click the "Build my Bookmarklet" button. Some text will appear.
# At the end of point 1, there is a "WebCite® this page" link. This is your personal bookmarklet. Drag this link into your Bookmarks/Favorites bar.
# To '''use''' the bookmarklet, simply click on it when you are on a web page you wish to archive. You will be sent to a page containing a link to the archive URL of the web page you wished to archive.
# It is recommended that you view the archived page to check if the archive process has been successful.
 
==Searching for previously archived web pages==
===Firefox smart keyword===
Web pages previously archived a WebCite can be found through a search form at https://www.webcitation.org/query
Firefox smart keywords are commonly used to perform searches through the Firefox address bar or to open a bookmark by typing a keyword into the Firefox address bar. Here we are going to use a smart keyword to submit a URL to WebCite for archiving. This method is moderately simple to set up, easy to use and is fast.
 
There is also an API. Please contact [[User:GreenC]] for information how this works.
# To '''set up''' the smart keyword, hit Ctrl+Shift+B to open up your Bookmarks Library (or by clicking the orange Firefox button on the top left of the window, then going to "Bookmarks", then "Show All Bookmarks")
# Browse to a ___location you would like to save the smart keyword bookmark in.
# In the menu at the top of the window, click "Organize", then "New Bookmark".
# Enter a name for the bookmark (e.g. <code>WebCite</code>).
# Enter <code><nowiki>https://www.webcitation.org/archive?url=%s&email=yourname@example.com</nowiki></code> into the Location field, replacing <code><nowiki>yourname@example.com</nowiki></code> with your email address. An email stating whether the archive process succeeded or failed will be sent to this address. If it was successful, the archive URL will also be included in the email.
# Enter a keyword for the bookmark. You should choose something short and this keyword must not already be used for another bookmark. (e.g. <code>wc</code>)
# Click the "Add" button. Close the Bookmarks Library.
# To '''use''' the smart keyword, add the keyword you chose ("<code>wc</code>" in the above example) followed by a space ("<code>&nbsp;</code>") in front of the URL of the web page you would like to archive in the Firefox address bar. (e.g. If you are using "wc" as your keyword, the text in the address bar would be <code>wc&nbsp;<nowiki>http://www.example.com/pageyouwantoarchive.html</nowiki></code>).
# Hit Enter. You will be sent to a page containing a link to the archive URL of the web page you wished to archive.
# It is recommended that you view the archived page to check if the archive process has been successful.
 
==Moving to a different provider==
===Chrome search engine===
You can help [[Wikipedia:Village_pump_(proposals)/Archive_159#RfC:_Deprecate_webcitation.org_aka_WebCite|deprecate WebCite!]]
Although this is created through Chrome's search engine feature, this functions just like a smart keyword in Firefox. This method is moderately simple to set up, easy to use and is fast.
 
Ideas to get rid of WebCite links:
# To '''set up''' the "search engine", right click the address bar and select "Edit search engines...". At the bottom of the list that comes up, you can add a "search engine".
# Search archive.org and archive.today - although bots already did this, bots are sometimes imperfect and a manual search could find something the bots missed.
# Enter a name for the "search engine" in the first field (e.g. <code>WebCite</code>).
# Find a different origin URL on the live web. For example, if the origin URL is to a Reuters story published in the NYT, there is a good chance that same Reuters story is available elsewhere. Use Google to search.
# Enter a keyword for the "search engine" in the second field. You should choose something short and this keyword must not already be used. (e.g. <code>wc</code>)
# Saving the WebCite link at archive.today works well and is recommended, however .. '''do not save at archive.org''' see "Things to be cautious of" below.
# Enter <code><nowiki>https://www.webcitation.org/archive?url=%s&email=yourname@example.com</nowiki></code> into the third field, replacing <code><nowiki>yourname@example.com</nowiki></code> with your email address. An email stating whether the archive process succeeded or failed will be sent to this address. If it was successful, the archive URL will also be included in the email.
# PDF files at WebCite do not save correctly at archive.today
# Hit Enter to save the "search engine".
# To '''use''' the "search engine", add the keyword you chose ("<code>wc</code>" in the above example) followed by a space ("<code>&nbsp;</code>") in front of the URL of the web page you would like to archive in the Chrome address bar (e.g. If you are using "wc" as your keyword, the text in the address bar would be <code>wc&nbsp;<nowiki>http://www.example.com/pageyouwantoarchive.html</nowiki></code>).
# Hit Enter. You will be sent to a page containing a link to the archive URL of the web page you wished to archive.
# It is recommended that you view the archived page to check if the archive process has been successful.
 
Saving a WebCite URL at archive.today follow these steps:
== Limitations ==
WebCite honors the [[robots exclusion standard]], as well as no-cache and no-archive tags and will not archive sites that disallow archiving.
 
# Save https://www.webcitation.org/5QE8rvIqH?url=http://www.birdlife.org at archive.today which will generate short-form URL https://archive.today/Jrvg8
For example, ''[[The New York Times]]'' has a robots.txt file at https://www.nytimes.com/robots.txt which includes:
# URL shortening is disallowed on Wikipedia; click the "share" button to see the long form: https://archive.today/20070710111036/https://www.webcitation.org/5QE8rvIqH?url=http://www.birdlife.org
:<code>User-agent: *</code>
# A potential [[SNAFU]] is there might also be https://archive.today/20070710111036/http://www.birdlife.org but this will probably contain different content then what you just saved at https://archive.today/20070710111036/https://www.webcitation.org/5QE8rvIqH?url=http://www.birdlife.org
:<code>Disallow: /aponline/</code>
:<code>Disallow: /archives/</code>
:<code>Disallow: /reuters/</code>
Thus, archive requests for URLs within those folders, and any other similarly listed folder of the New York Times website will be rejected.
 
Things to be cautious of:
==Use within Wikipedia==
Links archived with [[WebCite]] should appear in long format (see [[Wikipedia_talk:Using_archive.is#RfC:_Should_we_use_short_or_long_format_URLs.3F|RfC]]).
 
* It is not possible to save WebCite URLs at archive.org - it may appear to save correctly, but is an unreliable method. For why see [[Talk:WebCite#general_problem|this discussion]].
An example long format URL:
* Be aware of "content drift". When a web page has content that changes over time, such as stock prices or weather updates, this is called "drift". When the original WebCite snapshot was created it contains the intended information eg. current status of a typhoon at a certain day and hour. However, this page may change quickly, and any future snapshot of that same page will have different information. Thus when finding snapshots at other archive providers, be aware of content drift for certain types of pages.
 
:<code><nowiki>https://www.webcitation.org/5eWaHRbn4?url=http://www.example.com/</nowiki></code>
 
The 9-digit "Snapshot ID," similar to [[URL shortening]] services, contains a base 62 coded timestamp that can be extracted by bots and other programs. It also serves as a unique page ID. This is followed by the original URL which helps protect against malicious code that is hiding an inappropriate link, such as spam.
 
A second optional long format URL:
 
:<code><nowiki>https://www.webcitation.org/query?url=http://www.example.com&date=20091104</nowiki></code> (date in YYYYMMDD or YYYY-MM-DD format)
 
This foregoes the "Snapshot ID" and uses a date argument instead. Either is appropriate for use within Wikipedia.
 
This archive URL can be inserted into the <code>archiveurl=</code> and its supporting <code>archivedate=</code> and <code>url-status=</code> parameters in any of the [[Wikipedia:Citation templates|citation templates]]. If the original URL is [[Wikipedia:Link rot|no longer accessible]], the <code>url-status</code> parameter value should be set to <code>dead</code>. If the original URL is still accessible, the <code>url-status</code> parameter value should be set to <code>live</code>.
 
<code><nowiki><ref>{{cite web |last= |first= |title= |work= |publisher= |date= |url= |archive-url= |archive-date= |url-status= }}</ref></nowiki></code>.
 
==Searching for previously archived web pages==
Web pages previously archived through WebCite are accessible through [https://www.webcitation.org/query a searchable database]. Users may search by URL, date, or by "Snapshot ID".
 
==See also==
===Docs===
* [[Wikipedia:Link rot]], how-to guide for prevention of link rot
* [[WikipediaHelp:UsingArchiving thea Wayback Machine|Using the Wayback Machinesource]], how-to guide
** [[WikipediaHelp:Using Archive.is|Usingthe Archive.isWayback Machine]], how-to guide
** [[Help:Using archive.today]]
** [[Talk:Perma.cc#Perma.cc_and_Wikipedia|Using Perma.cc]]
===Tools===
* {{tl|User WebCite}}, userbox
* [[User:UBX/WebCite]], alternative userbox
* [[User:UBX/WebCite2]], "I donated" userbox
 
==Notes==