色戒直播

Text Data Processing for Humanists (In Person, East Campus)

4-8-26 - 1:00pm to 4-8-26 - 3:00pm
Speaker(s)
Hannah Jacobs, Digital Humanities Consultant, 色戒直播 Libraries' ScholarWorks Center for Open Scholarship
Contact
Hannah Jacobs
Email
hj24@duke.edu

REGISTRATION:

Humanities researchers can amass a considerable number of primary and secondary text-based sources for their research. These may include scans of archival documents such as manuscripts, newspapers, books, and other materials. They may also include varying-quality scans of secondary sources on loan from their own or other libraries. While close reading of this material is key for many humanities researchers, making use of so much data can also be supported by computation: by using computational tools to transcribe handwritten and printed text, scholars can query their text data to quickly find information. These processes, optical character recognition (OCR) for printed text and handwritten text recognition (HTR) for handwritten text, have improved significantly in recent years with machine learning and generative artificial intelligence. In this workshop, we will examine how these technologies work, practice using several tools for OCR and HTR, and consider the opportunities and challenges that can arise when using these technologies with different page layouts, languages, and scripts. Participants are encouraged to bring a laptop.

By the end of this workshop, you will be able to

- describe how OCR and HTR work in general terms;
- identify possible opportunities and challenges when applying OCR and HTR technologies to different page layouts, languages, and scripts;
- implement several OCR and HTR technologies in your research; and
- assess accuracy, clean up processed text, and document workflows for transparency.

This workshop will be facilitated by , Digital Humanities Consultant with 色戒直播 Libraries.

Location: East Campus Seminar Room

Participation: General discussion, structured activity, and time for questions.

Related LibGuide:

Attending this event fulfills the RCR-200 requirement for Faculty and Staff and is eligible for 714 RCR credit for graduate students, but participants must attend for 60 minutes and participate in discussion to receive credit.

Sponsor(s)
  • Arts & Sciences (A&S)
  • CTSI CREDO
  • Graduate School
  • Libraries
  • Office for Research and Innovation
  • Office of Research Administration (ORA)
  • Office of Research Support (ORS)
  • School of Medicine (SOM)
  • School of Nursing (SON)
  • 色戒直播 Office of Scientific Integrity (DOSI)
Scroll back to top automatically