codepointatbe only forward looking is for during iteration. For example, if I attempt to iterate over a string while looking forward and backward, I would have to account for repeated surrogate pairs (i.e., every surrogate pair would be returned twice). If the algorithm instead is only forward looking, then I can check for a low surrogate at each iteration, potentially skipping along to the next code point.
Was discussed back in 2012 to simply return the low surrogate. Not clear why: https://esdiscuss.org/topic/march-24-meeting-notes#content-24
This is probably because one can always get the high surrogate using
@stdlib/string/code-point-unit-countwhich takes a code point and returns the number of code units. This way, when we want to iterate over a string to return code points, we can first invoke
codePointAtto get a code point and then
codePointUnitCountto advance the index.
next-extended-grapheme-cluster-break? Not clear to me, based on this thread (https://bugs.python.org/issue30717), whether we’d also want to have a utility for non-extended breaks.
It’s possible that we may not want to care about “legacy” clusters. If we do, we could always do
Yeah, the Unicode report explicitly states "the legacy grapheme cluster boundaries are maintained primarily for backwards compatibility with earlier versions of this specification". https://unicode.org/reports/tr29/
In : numGraphemeClusters( 'Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞' ) Out: 6
Another alternative @congzhangzh is using something like this utility: https://github.com/getify/moduloze
We’ve tried as best we can to make the project as easy to convert to ESM as possible, so a tool like
moduloze should work quite easily on
@congzhangzh Quick update. The work being done by @rreusser is almost ready for use. We are just working on a few final edge cases.
Tks for you both works, it's great to hear you are near to finish, I will wait for your guys finish es module support, and focus on other side of my project first.