@kevinstadler This is a great question. Thanks for digging into the issue. I'll venture a short answer here to get things started …
1) I wrote the v2 of the API, which has 2 basic functions: serving texts and doing NLP processing. However @lukehollis leads the web projects and can answer exactly how the project will leverage it in the near future.
2) Luke can speak to the workflow of JSON and TEI-formatted texts.
3) About writing a better reader, we have had lots of thoughts … but not many decisions. @diyclassics has done some work , within the core python project, to create a reader. However nothing we'd quite call official yet.
4) So you are correct in seeing this as a weak spot throughout the CLTK, however I believe we have some OK ad hoc solutions. But this topic falls as much into the frontend as back, so I think @lukehollis should give his full opinion too, about whether this is a priority for GSoC '18