User:Harej/sandbox

From Wikipedia, the free encyclopedia
sandbox

since 2005

{{Pageset definition
| namespaces     =
| categories     =  
| category-depth = 
| wdq1           =
| petscan1       =
| domain-links1  =
| sql1           =
| links-here1    =
| transclusions1 =
| links-on-page1 =
}}

Notes[edit]

Missing articles[edit]

Citation watchlist script[edit]

https://en.wikipedia.org/w/index.php?title=Capital_punishment_in_the_United_States&diff=prev&oldid=1203024750

https://en.wikipedia.org/w/api.php?action=compare&fromrev=1203018841&torev=1203024750&format=json

<a class="mw-changeslist-diff" href="/w/index.php?title=Zoology&amp;curid=34413&amp;diff=1203018841&amp;oldid=1203024750">diff</a>

This diff adds a new sentence to the article and also adds a new link to a source.

In this one diff these two sources are cited:

Given a watchlist:

  1. Isolate each revision id and previous id from each line in the watchlist
  2. Check every five seconds if there is a revision id / previous id pair that hasn't been checked yet.

Given a pair (or batch of them):

  1. Use the "action=compare" endpoint.
  2. Screen out URLs with a regular expression (joke about now having an additional problem to solve for)
  3. Isolate domain names from URLs
  4. Check those sources against internal representation of RSP (hardcoded in script for now)
  5. If there's a hit, add an indicator next to the diff. (Red Triangle "!" for warn-list, yellow circle "?" for caution-list)

The problems I have with this approach:

  • Each user is doing the lookups and computations themselves, rather than going through a centralized service that does it for them

In the future when we have a centralized service doing this work, because we are doing something more complicated than screens against RSP,

The user script:

  1. Seeks consent to access the external service where data is coming from
  2. Scans each revision ID / prev ID on a watchlist
  3. Submits them to the service in batch
  4. Retrieves data
  5. Adds to HTML based on retrieved data

What about this "service"? If I set up WRDB as an ongoing, self-updated service, then all this service would need to do is check the revision ID in WRDB. At the moment, however, WRDB only supports a one-time build, and domain information is not directly stored in the database. However, this will help with support for non-URL references in the future.

Citation Watchlist testing[edit]

https://dailymail.co.uk