@kylepjohnson Do language corpus has to be in original scrpits?
For example - I did some research about the maithili language. It is primarily spoken in Eastern parts of India and Nepal. The original script for maithili is 'Tirhuta'. But, since 20th century 'devnagiri' script is preferred by the writers. 'Tirhuta' is not digitized yet. And now all of the digitized maithili texts are in 'devnagiri' script.
@sainimohit23 no, they do not. For example, last summer's cuneiform/akkadian project was entirely done in the Latin alphabet. this is because the scholars who digitize these texts choose the Latin alphabet (and we must use what they have created).