Hi, I'm Shangbang Long, from Peking University, China. I'm interested in the the Classical Chinese part. This project is a really interesting one and reminds me of one of my former projects at university, in which we used NLP techniques to analyze ancient literature and books in order to study the Chinese society back in the old times. I think among the NLP functionalities, tokenizer and POS taggers would be the most frequently used ones. I think I can also add some more Classical-Chinese-specific features, e.g. interchangeable borrowing recognition, ellipsis completion and etc.. I also happen to know a really great source for more high quality corpora where detailed labelling is attached. It's easy and effortless to collect and clean. We can discuss more via email.
And.. I'm wondering how many students you are going to supervise? and how many mentors are there? @kylepjohnson