Department/Area: Institute for Quantitative Social Science (IQSS)
This posting is for a Data Science Fellow to participate in the design of the Automated History Archive. Many of the biggest challenges that our society faces have their roots in the past, and history can provide fundamental insights into their causes and potential solutions. However, vast amounts of historical quantitative data that could shed light on important issues remain locked in hard copy due to prohibitive curation costs. Historical data are often scattered irregularly amongst text in the original publications. Commercial OCR software performs poorly when tables are irregular, often requiring the user to manually denote the structures by drawing boxes. Off the shelf tools for table assembly using clustering machine learning methods do not exist.
The Automated History Archive will automate the conversion of historical quantitative images into classified, machine-readable datasets on a large scale and deposit these in a collaborative, open source data platform. Building on our initial successes, the fellow will play a core role in developing algorithm prototypes that integrate computer vision tools that can recognize data structures in the raw images with machine learning techniques for classifying digitized table fragments.
The position provides a unique opportunity for a promising young scholar – planning to pursue a PhD in Engineering, Computer Science, or a quantitative Social Science – to be immersed in a top-notch research environment. The initiative is housed in Harvard's Institute for Quantitative Social Science (IQSS), which is dedicated to understanding and solving society's greatest problems through bold and collaborative social and data science. The fellow will work closely with the PI, Professor Melissa Dell. The fellow will be an active participant in the Harvard research community and will have opportunities to develop their own research agenda on issues related to the initiative.
There are two open positions with a one year term, with a potential opportunity for extension (conditional on funding availability and performance). The start date is flexible.
Applicants should have experience working with machine learning methods for image data. Beyond this, it is imperative that applicants have an interest in advancing methodology for non-standard use, towards automated extraction of structured data from large datasets of historical document scans. The Data Science fellowship requires innovating methods, not simply applying existing tools. The position requires a Bachelors degree.
Please see Special Instructions section.
Contact Email: email@example.com
Please do NOT apply via ARIES. Only applicants who follow the application instructions will be considered.
Interested candidates should send a CV, transcript, and one letter of reference to firstname.lastname@example.org. The subject line should contain the phrase
Internal Number: 7807
About Harvard University
Harvard University is devoted to excellence in teaching, learning, and research, and to developing leaders in many disciplines who make a difference globally. The University, which is based in Cambridge and Boston, Massachusetts, has an enrollment of over 20,000 degree candidates, including undergraduate, graduate, and professional students. Harvard has more than 360,000 alumni around the world. The University has twelve degree-granting Schools in addition to the Radcliffe Institute for Advanced Study, offering a truly global education. Established in 1636, Harvard is the oldest institution of higher education in the United States.