These are chat archives for synrc/n2o

14th
Feb 2017
Andy
@m-2k
Feb 14 2017 06:56
@seb3s what for? You can create PR to n2o and nitro repos if you want.
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 11:30
Hi, the reasons : consistency with js_escape & performance. Code already exists in the current version of nitrogen.
I'm using it right now.
Andy
@m-2k
Feb 14 2017 11:31
no performance
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 11:32
I ask here for integration just to be sure I'm not missing something as I don't want to break existing code.
Andy
@m-2k
Feb 14 2017 11:32
okay
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 11:33
if I submit a list to html_encode, it is converted twice before being escaped
to a binary first, then to a list, then escaped.
cost probably a little bit, even if I haven't mesure it :-)
Andy
@m-2k
Feb 14 2017 11:34
true
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 11:34
I do a lot of js_escape(html_encode(binary))
so I want to keep the binaries all the way and avoid unecessary conversions in the middle.
line 140 and followings
Andy
@m-2k
Feb 14 2017 11:38
nitrogen code tested with unicode binary strnigs?
https://github.com/erlang-unicode/ux library work with lists instead of binary strings
just trust and no bullshit
I remember Nitrogen escapes all non-ASCII characters, so that Nitrogen code is not an indicator of quality
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 11:45
Don't know what kind of testing has been done.
But I haven't seen any troubles on my side right now. I've tried pasting russian & asian languages to check that it is ok. Seems good to me but I'm not an unicode expert at all.
Andy
@m-2k
Feb 14 2017 11:47
no testing is not necessary, unicode should be handled by characters rather than byte by byte
Also lists are not lagging behind in the processing speed of the binary
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 12:03
OK, here is my understanding. I think the nitrogen escaping is ok.
Only 8 characters are transformed, and all belongs to ASCII character set.
in UTF8, all ascii chars have a 0 in their highest bit so they can't be mistaken with any of the other chars in unicode.
The rest of the binary is just copied, 8 bits by 8 bits (so many bytes may be used to form a char) but this is totally agnostic and if the binary if well formed in input, I will be well on output as these parts are not touch.
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 12:09
So the function don't have to know which parts form a char or not, as only very specifics ascii chars are escaped and these chars can be fully identified without trouble by only reading 8 bits at a time.
I will keep it running for a while, and when I will reached thousand of customers, I will validate or not my expectations :-)
Andy
@m-2k
Feb 14 2017 12:15
Ok )
Compatibility - check
Performance - not check
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 12:18
I will do some basic tests right now to see that.
Andy
@m-2k
Feb 14 2017 14:12
I try to find time
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 15:20
Ok so let's see my results
The list version is a little bit quicker.

w_test:bin_html_encode(10000000).
Code time=22900 (24242) ms

w_test:n2o_html_encode(10000000).
Code time=20390 (21509) ms

Sébastien Saint-Sevin
@seb3s
Feb 14 2017 15:26
%% -*- coding: utf-8 -*-
%% *****************************************************************************
-module(w_test).
-compile(export_all).


n2o_html_encode(N) ->
    statistics(runtime),
    statistics(wall_clock),

    for(1, N, fun() -> 
        wf:to_binary(wf:html_encode(<<"This is a html & test 'string with a <few> escapes inside.">>))
    end),

    {_, Time1} = statistics(runtime),
    {_, Time2} = statistics(wall_clock),
    io:format("Code time=~p (~p) ms~n", [Time1, Time2]).


bin_html_encode(N) ->
    statistics(runtime),
    statistics(wall_clock),

    for(1, N, fun() -> 
        w_convert:html_encode(<<"This is a html & test 'string with a <few> escapes inside.">>)
    end),

    {_, Time1} = statistics(runtime),
    {_, Time2} = statistics(wall_clock),
    io:format("Code time=~p (~p) ms~n", [Time1, Time2]).


empty_run(N) ->
    statistics(runtime),
    statistics(wall_clock),

    for(1, N, fun() -> ok end),

    {_, Time1} = statistics(runtime),
    {_, Time2} = statistics(wall_clock),
    io:format("Code time=~p (~p) ms~n", [Time1, Time2]).


for(N, N, F) -> F();
for(I, N, F) -> F(), for(I+1, N, F).


%% *****************************************************************************
%% END OF FILE
%% -----------------------------------------------------------------------------
This is for a 10 million run on my machine.
The nitrogen code is poorly written and is twice slower than the one i used in these tests.
I've modified it to use an accumulator and to append on the binary instead of building in front of it
ihe(<<">", T/binary>>, ET, Acc)              -> ihe(T, ET, <<Acc/binary, "&gt;">>);
ihe(<<"<", T/binary>>, ET, Acc)              -> ihe(T, ET, <<Acc/binary, "&lt;">>); 
ihe(<<"\"",T/binary>>, ET, Acc)              -> ihe(T, ET, <<Acc/binary, "&quot;">>);
ihe(<<"'", T/binary>>, ET, Acc)              -> ihe(T, ET, <<Acc/binary, "&#39;">>);
ihe(<<"&", T/binary>>, ET, Acc)              -> ihe(T, ET, <<Acc/binary, "&amp;">>);
ihe(<<"  ",T/binary>>, whites=ET, Acc)       -> ihe(T, ET, <<Acc/binary, " &nbsp;">>);
ihe(<<"\t",T/binary>>, whites=ET, Acc)       -> ihe(T, ET, <<Acc/binary, "&nbsp; &nbsp; &nbsp;">>);
ihe(<<"\n",T/binary>>, whites=ET, Acc)       -> ihe(T, ET, <<Acc/binary, "<br>">>);
ihe(<<H:8, T/binary>>, ET, Acc)              -> ihe(T, ET, <<Acc/binary, H>>);
ihe(<<>>, _ET, Acc) -> Acc.
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 15:33
So the best way to get a binary version for free is probably to do something like
html_encode(Bin) when is_binary(Bin) -> wf:to_binary(wf:html_encode(Bin)).
or better, when needing to js_escape the html_encode, rewrite the js_escape code to use list instead of binaries !!!
no more binary conversion in the middle and that would be the quickest way
Andy
@m-2k
Feb 14 2017 15:36
fukin god
just create PR
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 15:37
Funny coz I've read a lot of articles stating we're going faster coz we remove list and do proper binary handling all along the road... Not sure this is really true now
Andy
@m-2k
Feb 14 2017 15:37
you experience a lack of performance?
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 15:37
no, but i like my code to be consistent and clean
:-)
Andy
@m-2k
Feb 14 2017 15:38
-define?
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 15:38
Yep I already have a few of them
could add a few more for sure
Andy
@m-2k
Feb 14 2017 15:40
faster way: process raw list/unicode-list/binary/utf8-binary and call list_to_binary/1 in fin
without binary concatination
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 15:42
let's see
Andy
@m-2k
Feb 14 2017 15:42
I looked )
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 15:55
w_test:bin_html_encode(10000000).
Code time=18070 (18894) ms
Yes, it goes a bit quicker than the list version, cool !!
Sébastien Saint-Sevin
@seb3s
Feb 14 2017 16:46
Waow
I've rewritten js_escape to use that as well
I see incredible results
this now runs twice as fast on my machine
could you please check ?
js_escape(undefined) -> [];
js_escape(Value) when is_list(Value)        -> binary_to_list(js_escape(iolist_to_binary(Value)));
js_escape(Value)                            -> js_escape(Value, <<>>).
js_escape(<<"\\", Rest/binary>>, Acc)       -> js_escape(Rest, [Acc | "\\\\"]);
js_escape(<<"\r", Rest/binary>>, Acc)       -> js_escape(Rest, [Acc | "\\r"]);
js_escape(<<"\n", Rest/binary>>, Acc)       -> js_escape(Rest, [Acc | "\\n"]);
js_escape(<<"\"", Rest/binary>>, Acc)       -> js_escape(Rest, [Acc | "\\\""]);
js_escape(<<"'",Rest/binary>>,Acc)          -> js_escape(Rest, [Acc | "\\'"]);
js_escape(<<"<script", Rest/binary>>, Acc)  -> js_escape(Rest, [Acc | "<scr\" + \"ipt"]);
js_escape(<<"script>", Rest/binary>>, Acc)  -> js_escape(Rest, [Acc | "scr\" + \"ipt>"]);
js_escape(<<C, Rest/binary>>, Acc)          -> js_escape(Rest, [Acc | [C]]);
js_escape(<<>>, Acc) -> list_to_binary(Acc).
Andy
@m-2k
Feb 14 2017 16:54
оok, later