Ancient Lives, an online crowdsourced project, started in 2011 at the University of Oxford that uses of which is using the public to transcribe ancient papyri. The project is relaunching in 2017. James Brusuelas, a researcher of papyrology and digital philology at the University of Oxford, came to the University of Waterloo Jan. 5 to speak about the Ancient Lives project and the progress so far.
The project started as an idea to put the collection online and invite the general public to help with transcribing the material.
“One of the biggest problems about this collection isn’t so much that we publish it slowly, it’s that we know very little in terms of its mass amount,” Brusuelas said. “And we really do just need the initial transcriptions to kind of do a rough classification and figure things out and to actually help prioritize the collection.”
The initial approach of the project stuck with transcription and pattern recognition. One problem they came across was that as the writing shifted to cursive, the accuracy level dropped, Brusuelas said.
“What we kind of expected was that for a very clear hand, we can have volunteers basically agreeing with professionals at the rate of 90 to 95 per cent,” he said. “Now, as we also expected, as this got cursive, the percentage started to drop. So I think a couple examples were very terrible … they were going down to 50 to 54 per cent agreement.”
Another problem was processing the data that came out of clicks from millions of people taking part in transcribing the works. In order to solve this problem, Brusuelas and his team used algorithms and machine learning, in order to make sense of the data. Using this data, they were able to come to a consensus.
“We’re still working on how to expedite this process for those who are assigned to work on this text,” he said.
Brusuelas hopes to transform this project into identifying authors based on handwriting styles.
“If we could apply some computer vision and start working in handwriting identification, then another way we can interrogate this mass amount database is by images in the data to kind of see if we can put together things that might belong together in some way,” Brusuelas said. “I would be very interested to see if we could actually use the machine learning with harrowing styles the way humans have classified these and to see where the inconsistencies crop up.”