Wikipedia:GLAM/British Library/Case Studies

From Wikipedia, the free encyclopedia

As part of the current residency, a review is being undertaken of Wikimedia work done in the Library from 2013 to the present day. Participants are asked to answer the following questions about their project, and resultant reports will be posted on the British Library website in due course.

Survey Questions
Question Further Details
Suggested Title Please suggest a title of the project you worked on.
Staff Members Please give names for staff members involved in this project, and their roles within the Library.
Department Please comment under which department this work was carried out, if any.
Collaborators Community collaborators, if any.
Time Period The rough time period over which the work was undertaken.
Lay Summary of Objectives (<100 words) A brief summary of the aims and objectives you set out to address in doing this work.
How Was Wikimedia Used? (<250 words) How was Wikimedia utilised in this work? Which Wiki projects did you make use of?
Any Other Resources? (<150 words) Please list any other useful resources that played a role in this work, such as tools, software or hardware.
Outcomes (<200 words) Can you briefly summarise the key outcomes of this work?
Impact (<200 words) What tangible impact can you see this work having/has it already had? Are there other use scenarios?
Links/Resource Please provide links to any publicly available materials arising from this work.

This format was adapted from work done by Wikimedia UK in partnership with the University of Edinburgh, as shown here.

Bengali Books[edit]

2021[edit]

Suggested Title Wikisource Competitions to Proofread 19th Century Bengali Books
Staff Members Tom Derrick, Digital Curator, Two Centuries of Indian Print
Department n/a
Collaborators Bodhisattwa.
Time Period 15 March - 14 April 2021 & 1 September – 30 November 2021
Lay Summary of Objectives (<100 words) The main objective was to make our digitised Bengali books more widely accessible to readers based in South Asia. Bengali Wikisource seemed like a good fit as a place to host out of copyright printed books and has an interface written in Bangla.

We also wanted to improve the accuracy of transcriptions for the text within these books that had been automatically generated using Optical Character Recognition (OCR). Finally, we were interested in exporting and reusing the corrected transcriptions to improve keyword search of the books within the Library’s image presentation service and make them available as downloadable datasets.

How Was Wikimedia Used? (<250 words) We used the Pattypan tool to batch upload pdfs of our books to Wikimedia Commons, along with metadata describing publication details for each book.

Once the books were in Wikimedia Commons, our collaborator at the West Bengal Wikimedian’s User Group created index pages within Wikisource for each book and subsequently used Google’s OCR tool to create transcriptions of the text contained within the books. As these transcriptions contained spelling errors,, he suggested running a competition to crowdsource corrections that would improve the reading experience for Wikisource users. Two competitions were run in 2021, both hosted on Bengali Wikisource, through which entrants could edit the OCR generated text and proofread pages. Our partners at the West Bengal Wikimedian’s User Group are responsible for validating the proofread transcriptions before they are made publically available through Wikisource.

Any Other Resources? (<150 words) We found it would be advantageous to store the structured metadata for these books in Wikidata so that it can be queried and linked to other relevant Wiki pages to form part of a larger network of information about the authors and publishers who produced the books.

Our collaborator used the Cradle tool to create Wikidata entries for the books written in Bangla, based on the metadata we had uploaded to Wikimedia Commons.

Outcomes (<200 words) Across the two competitions, more than 5,000 pages of text were fully corrected by the community of volunteers.

The corrected text is being validated by Wikisource administrators, with more than 35 books so far made publically available on Wikisource, with more to be added as they pass validation. In both competitions, top contributors were awarded prizes which included books, t-shirts, momentos and certificates.

Impact (<200 words) Ultimately, this initiative led to the creation of thousands of pages of accurate text that could not have been achieved in the same timeframe without the help of crowdsourcing.

Users of Wikisource can view page images and transcripts side-by-side and automatically translate them into more than 100 languages through the browser, or download for further exploration. Although we are not currently able to export corrected transcriptions directly from Wikisource in a format that can be ingested into the Library’s LibSafe system, we intend to convert the format of exported transcriptions, enabling accurate and reliable searching of these books within the Library’s Universal Viewer. This initiative is one of several that has strengthened the connections between Library and the Wikimedia Foundation, and has directly led to further collaborations and events. In the aftermath of the first competition in summer 2021, we delivered a joint talk with our competition partner in India for British Library staff in which we shared our experience of working with the Bengali books on Wikisource.

Links/Resource

Contemporary Literature[edit]

2020[edit]

Suggested Title CPPL and CLCA Collections Web Presence Audit
Staff Members Eleanor Casson, Manuscripts Cataloguer & Eleanor Dickens, Curator
Department Contemporary Literary and Creative Archives and Contemporary Politics and Public Life
Time Period July-August 2020
Lay Summary of Objectives (<100 words) The aim of the project was to improve accessibility to the archive collections in the Contemporary Literary and Creative Archives (CLCA) and Contemporary Politics and Public Life (CPPL) teams, particularly to new audiences not reached through traditional access points. The objectives were to update Wikipedia articles with details about the collection and an external link to catalogue, and to update WikiData entries with a link to the archive. In some cases, Wikipedia and Wikidata entries did not exist for the creators of archives we hold at the Library; where possible we created Wikipedia pages and Wikidata entries for these individuals and organisations based on the guidelines of the Wikipedia community.
How Was Wikimedia Used? (<250 words) We used Wikipedia and Wikidata for this project. The department had already used Wikipedia in the past to attach external links to the catalogue or include information about the archive under a Legacy sub-heading. This project ensured that all catalogued collections were linked to a relevant article and that any previously created entries were updated. I used the Wikipedia Library/Cultural Professionals project page for guidance, particularly in regards to templates for linking archival material. The department had not used WikiData before to link to collections we decided to use this source as it links to Google’s Knowledge panels, which will improve accessibility to the catalogue information.
Any Other Resources? (<150 words) For this project we did not use any other resource, however, the intention is to utilise Wikidata’s ‘Quick Statements’ through OpenRefine to check the values for the Wikidata entries to ensure they remain the same and that the links are still in place.
Outcomes (<200 words) The main outcome from the project was creating and updating Wikipedia pages, and creating archive entries on Wikidata for around 100 archive collections in the Contemporary Literary and Creative Archives and Contemporary Politics and Public Life departments at the Library. The project improved accessibility to the Library’s archive catalogue and created greater exposure for the archive collections to audiences that may not have discovered the collections otherwise. It is our intention that the processes followed in this project are embedded into the cataloguing workflow to ensure that all new collections are also linked on Wikipedia and Wikidata.
Impact (<200 words) Unfortunately, we do not currently monitor access to the archive catalogue or the frequency that researchers request collections. Therefore, we are unable to quantify the impact using Wikipedia and Wikidata has had on the accessibility of the archive collections.
Links/Resource We have not written any papers about this work. The only links are to examples of the Wikipedia and Wikidata entries we created:

Programme for Collaborative Cataloguing[edit]

2020 - 2021[edit]

Suggested Title Programme for Collaborative Cataloguing Wikidata Pilot
Staff Members
  • Erin Burnand – ISNI Operations Implementation Manager/AC Team manager
  • Chris Robinson – Metadata Specialist (ISNI)
  • Stavroula Angoura – Metadata Specialist (ISNI)
Department Collection Metadata
Time Period August 2020 to October 2021
Lay Summary of Objectives (<100 words) We took part in the PCC Pilot through our role as both members of the Programme for Collaborative Cataloguing and in our capcity as NACO (Name Authority Cooperative Programme) contributors.

As a team specialising in identity management, we intended to treat the pilot as a proof of concept, with the main aim of comparing Wikidata with well-established identity management processes currently used in the team – NACO and ISNI.

We intended to look at the following:

  • Production time
  • Training of staff
  • QA tools
  • Documentation requirements
  • Batch loading
How Was Wikimedia Used? (<250 words) First phase

This comprised of an initial familiarisation/training period (which took longer than we anticipated) and experimentation with creating Wikidata records for persons/identities.

As we wanted to compare processes, we would create Wikidata items in the same way as ISNI & NACO – the basis would be items coming into the library, and a record would be created for the author/associated organisation.

Second phase

Here, we thought about ways in which to incorporate Wikidata into our exisiting workflows in order to add value.

We ideally wanted to avoid creating Wikidata items from scratch and concentrated on batch processes.

We felt we had more room to experiment using ISNI & Wikidata. At the time, ISNI was heavily involved with music metadata, so we identified a way to match ISNI records with corresponding Wikidata record by using an ISNI source ID ‘MUSICBRAINZ’.

Using OpenRefine we were able to add the ISNI URI to the Wikidata record, adding nearly 18,000 ISNI URIs and increasing the percentage of Wikidata records containing ISNI URIs from 8.4% to 28.4%

We felt this had the benefit of increasing the visibility of ISNI on a popular, mainstream platform, and hopefully increasing visitors to the ISNI website.

Any Other Resources? (<150 words) n/a
Outcomes (<200 words) Production time

Initially we hoped Wikidata would serve as a NACO ‘lite’ but we found creation of Wikidata records took the same time, if not longer, than metadata creation in ISNI and NACO

Training of staff

We found the Wikidata interface was very user friendly and easy to get started on. There is no doubt that Wikidata is more accessible than ISNI or NACO, not least in relation to the membership requirements associated with them. While basic editing is user friendly, we thought that the more complex elements of Wikidata such as SPARQL could be a barrier for some cataloguers in the library. While we did not explore this, we discovered that other libraries involved in the pilot experimented with converting Wikidata to MARC21 with a view to create basic authority records as a means to address training/resourcing issues in a library setting.

Quality Assurance/Checking

There were concerns within the library community regarding quality of the data and the contents. There were also concerns about deletions from moderators on the basis of notibility etc. although we did not experience this.

Documentation requirements

We found the documentation on Wikidata very thorough and useful. If Wikidata was adopted within Metadata Creation it would probably be necessary to create some in house guidance in addition.

Batch processes

This was successful for us and something we would like to explore further.

Impact (<200 words) While Wikidata had many positives, we found that there would be little value to incorporating it into our workflows at this time, mainly due to already working on two very established identifier management standards.

However, due to the considerable community buy-in and recognition it receives it would be foolish not to have a good level of awareness within the Authority Control team, with a view to participating in (most probably project based) work in the future.

Lotus Sutras[edit]

2023[edit]

Survey Questions
Question Further Details
Suggested Title Lotus Sutra Project images on Wikimedia Commons.
Staff Members Laura Parsons / Tan Wang Ward (Project Manager, Lotus Sutra Manuscript Digitisation Project – Tan was on maternity leave and Laura was acting in the role)
Department Asian and African Studies
Time Period September 2021 to June 2022
Lay Summary of Objectives (<100 words) Upload images of a set of scroll manuscripts that were conserved and digitised as part of the Lotus Sutra Project to Wikimedia Commons and Wikidata. The objective of this was to increase access and visibility, by showcasing some items, in order to direct users to the wider digitised collection that is available on the IDP website (http://idp.bl.uk/).
How Was Wikimedia Used? (<250 words) Images were added to Wikimedia Commons and accompanying metadata was added to Wikidata
Any Other Resources? (<150 words) We shared files via a Google drive.
Outcomes (<200 words) At the end of this work, a subset of the items from the Lotus Sutra Project (10 items) are displayed on Wikimedia Commons with their metadata on Wikidata. The metadata was also enhanced by a university placement student, Xiaoyan Yang from University College London.
Impact (<200 words) We are hoping to increase online visibility of the Lotus Sutra Project and the digitised collections of the IDP, including directing interested users to the IDP website to access more content.

We also shared knowledge to enhance the technical skills of the IDP team including learning about how to use Wikimedia. This could be used to add more IDP content in future.

Links/Resource Working With Wikidata and Wikimedia Commons: Poetry Pamphlets and Lotus Sutra Manuscripts by Xiaoyan Yang.