These are chat archives for YoEight/eventstore

20th
Jan 2017
Akii
@Akii
Jan 20 2017 10:35
well no idea how to debug this. can't even get reliable putStrLn outputs
Yorick Laupa
@YoEight
Jan 20 2017 10:59
You're learning the hard way how IO behaves
Keep pushing
Having say that, that doesn't mean I will not work on it
I'd be just cool if you figure it out before me 😊
Akii
@Akii
Jan 20 2017 17:42
Had to do some "real" work today; time to get back at it
Akii
@Akii
Jan 20 2017 18:13
okay I give up
this is beyond me
I fixed it
but I do not understand it a bit
I can also see that #62 is related
Akii
@Akii
Jan 20 2017 18:19
Akii/eventstore@2481581
this "fixes" the bug
Yorick Laupa
@YoEight
Jan 20 2017 18:19
What do you mean by you fix it ? There isna PR somewhere ?
Akii
@Akii
Jan 20 2017 18:20
of course not
look at the fix
Yorick Laupa
@YoEight
Jan 20 2017 18:22
Nice fix indeed !
Akii
@Akii
Jan 20 2017 18:22
:P
no but seriously
I've not been able to reproduce since
removing the two prints introduces the problem again
the console prints 0
that's all I know
I suspect this might have something to do with lazy evaluation
(like what else could it be)
the more interesting thing is that num is 0 here
where it should be 16
which might be related to #62
Akii
@Akii
Jan 20 2017 18:59
still on it btw
not giving up that easily :)
Akii
@Akii
Jan 20 2017 20:03
well I'm looking forward to tomorrow
I'll be there all day
I think this bug is non-trivial
pretty sure where we can start tracing both bugs
let (num, posM) = lastEventNumSM sm
printing num solves #61, ignoring num solves #62
Yorick Laupa
@YoEight
Jan 20 2017 20:16
Do you have some code to share, because it's hard to follow
Akii
@Akii
Jan 20 2017 20:25
sure
so I lack knowledge of why this happens
Yorick Laupa
@YoEight
Jan 20 2017 20:26
I know the code I was referring to your changes
Akii
@Akii
Jan 20 2017 20:27
my changes are just fixing symptoms
some evaluation is triggered by printing num
Yorick Laupa
@YoEight
Jan 20 2017 20:27
that doesn't tell why you did
Akii
@Akii
Jan 20 2017 20:28
I just played around and evaluated the outcome
all I know is printing num at this place seems to evaluate the program enough to continue working
people from #haskell-beginners pointed out that this is probably just a far side-effect
and looking at how far off lastEventNumSM sm is I can only guess
Yorick Laupa
@YoEight
Jan 20 2017 20:30
where do you put your print call ?
this is the only thing that makes a difference, ignore the rest of the commit
additionally, don't ask me why this fixes things
what I have not tried is to check if the subscription still works
I assume it does since the driver continues to connect to the store afterwards
and there has not been made any program flow changes other than evaluating something
Yorick Laupa
@YoEight
Jan 20 2017 20:35
I pushed a quick fix on fix/laziness-issue branch
can you pull it and tell me if it fixes it
Akii
@Akii
Jan 20 2017 20:35
gtg for a bit; back in 20 min
will do once I'm back
I'll be definitely online the whole weekend if you're not there in 20 min :D
Yorick Laupa
@YoEight
Jan 20 2017 20:37
If you're right, what I did fix it
YoEight/eventstore@6a5cba0
Akii
@Akii
Jan 20 2017 21:07
that looks like it could work
I'll check that out now
Yorick Laupa
@YoEight
Jan 20 2017 21:10
If it does that would be the first time I got bit by the lazy-evaluation in 6 years of Haskell
Akii
@Akii
Jan 20 2017 21:11
unable to confirm just yet
doesn't look like fixed
let me make sure I don't have a fuckup
Yorick Laupa
@YoEight
Jan 20 2017 21:13
then print isn't related
ok
Akii
@Akii
Jan 20 2017 21:14
also maybe you want to reproduce it yourself
have you been able to?
Yorick Laupa
@YoEight
Jan 20 2017 21:14
sure thing but you know laziness :-)
you've been so nice
Akii
@Akii
Jan 20 2017 21:15
I just want to avoid false negatives or positives
I'll apply "my fix" to your branch to be sure
uhm well
how to put this
after your change, the printing doesn't fix it anymore
Yorick Laupa
@YoEight
Jan 20 2017 21:18
alright I try to reproduce it
Akii
@Akii
Jan 20 2017 21:19
master + print still works
I can provide a test binary ^^
Yorick Laupa
@YoEight
Jan 20 2017 21:20
for me it means that something different then
because forcing doesn't solved anything
Akii
@Akii
Jan 20 2017 21:20
yes whatever print does
print does something
glguy on IRC said something
I'd also try to narrow down what the trigger is. It seems unlikely that it's important that that actual number is printed
I'd expect it to be related to laziness, or printing at all
unfortunately I'm all out of ideas
Yorick Laupa
@YoEight
Jan 20 2017 21:23
I confident I will find it though
just have to get started
need a coke first
Akii
@Akii
Jan 20 2017 21:23
now my initial thought was that this var is causing the inf lock
because if this never gets evaluated
but I'm only guessing at this point
a call to catchupOpEventHandler will always fill the MVar again no matter what
Akii
@Akii
Jan 20 2017 21:30
this might be the most interesting bug in my life so far
and it's funny because Haskell is able to throw an exception
compared to like Java where this bug would be a nightmare
Yorick Laupa
@YoEight
Jan 20 2017 21:37
in this case yes
but that isn't always the case
Yorick Laupa
@YoEight
Jan 20 2017 21:52
did you forget to put threadSleep on you code snippet on #61 ?
Akii
@Akii
Jan 20 2017 21:52
well I'm glad for every time the compiler can at least come up with a runtime exception
Yorick Laupa
@YoEight
Jan 20 2017 21:53
because obviously it got to fast to actually to do something like shutting down a server
Akii
@Akii
Jan 20 2017 21:53
no threadSleep required
Yorick Laupa
@YoEight
Jan 20 2017 21:53
are you a robot then ?
I can you kill the server and at the same si what happen when folding over the events of your subscription ?
Akii
@Akii
Jan 20 2017 21:54
wait a second :D
the minimal case in the comment is what you want to use for the test
es <- connect defaultSettings { s_retry = keepRetrying } (Static "localhost" 1113)
sub <- subscribeFrom es "strim" True (Just 18) (Just 500)
_ <- getLine
point that to a stream that exists
and then not press anything just let it be
Yorick Laupa
@YoEight
Jan 20 2017 21:55
there isn't a getLine in the snippet :-)
Akii
@Akii
Jan 20 2017 21:56
sure is
I just copied it from the issue
(check the comment)
Yorick Laupa
@YoEight
Jan 20 2017 21:56
nope there isn't check on #61 to see
not a big obviously
I had it on my code
something to note
I don't know how many events you got on your stream
Akii
@Akii
Jan 20 2017 21:57
Screen Shot 2017-01-20 at 22.57.02.png
it doesn't matter
Yorick Laupa
@YoEight
Jan 20 2017 21:57
that #62
I'm on #61
Akii
@Akii
Jan 20 2017 21:57
nope
I literally clicked on #61 to make that screenshot right now
#62 just describes the weird issue with the resending events
Yorick Laupa
@YoEight
Jan 20 2017 21:58
ah sorry I was reffering to the snippet of the ticket description
my bad
Akii
@Akii
Jan 20 2017 21:58
ye that just does a forever
the more minimal test case is in the comment
establish connection, create sub, wait
Yorick Laupa
@YoEight
Jan 20 2017 21:59
what I wanted to say is
if you have less than 500 events on strim stream
it will be retrieve through readStreamForward first
what I mean is it seems it's the first phase of the catchup subscription that's at fault
Akii
@Akii
Jan 20 2017 22:00
okay, this is the case
Yorick Laupa
@YoEight
Jan 20 2017 22:01
not the subscription part itself
Akii
@Akii
Jan 20 2017 22:01
the stream has exactly 18 events in it
well turning down the batch size should fix that then
Yorick Laupa
@YoEight
Jan 20 2017 22:02
let me confirm first :-)
Akii
@Akii
Jan 20 2017 22:03
fix as in it changed
the print no longer works
Yorick Laupa
@YoEight
Jan 20 2017 22:03
I don't get the purpose of getLine right after subscribing to a stream
Akii
@Akii
Jan 20 2017 22:03
this will just stop the thread
Yorick Laupa
@YoEight
Jan 20 2017 22:04
yes but what the purpose
Akii
@Akii
Jan 20 2017 22:04
as I said yesterday establishing a connection to the store happens without an action
it just does
Yorick Laupa
@YoEight
Jan 20 2017 22:04
ah
Akii
@Akii
Jan 20 2017 22:04
now
that alone does not change anything
Yorick Laupa
@YoEight
Jan 20 2017 22:04
you call subscribeFrom, that's an action
Akii
@Akii
Jan 20 2017 22:04
even without
it establishes a connection
anyway
what breaks it is calling subscribeFrom
Yorick Laupa
@YoEight
Jan 20 2017 22:05
let's get this from the beginning shall we because I'm lost
Akii
@Akii
Jan 20 2017 22:05
sure
es <- connect defaultSettings { s_retry = keepRetrying } (Static "localhost" 1113)
_ <- getLine
Yorick Laupa
@YoEight
Jan 20 2017 22:05
ok
Akii
@Akii
Jan 20 2017 22:05
use this and watch the output of the eventstore log
it will acknowledge the connection with a msg like this:
[48206,27,22:02:07.515] External TCP connection accepted: [Normal, 127.0.0.1:50591, L127.0.0.1:1113, {fc07d12b-5bec-4598-8232-4e3bf3639162}].
you can also see the connection in the web frontend
and the heartbeats associated with it
at this stage you can kill and restart the store as often as you like
Yorick Laupa
@YoEight
Jan 20 2017 22:08
sure
so now I'm supposed to kill the server right ?
Akii
@Akii
Jan 20 2017 22:08
exactly
Yorick Laupa
@YoEight
Jan 20 2017 22:08
ok
Akii
@Akii
Jan 20 2017 22:08
ctrl+c
Yorick Laupa
@YoEight
Jan 20 2017 22:08
and restart it
?
Akii
@Akii
Jan 20 2017 22:09
yes
the driver meanwhile will try to re-connect
Yorick Laupa
@YoEight
Jan 20 2017 22:09
yes, successfully
Akii
@Akii
Jan 20 2017 22:09
yep
do it again
Yorick Laupa
@YoEight
Jan 20 2017 22:09
do what ?
Akii
@Akii
Jan 20 2017 22:09
kill the server, observe the reconnect
Yorick Laupa
@YoEight
Jan 20 2017 22:10
kill and restart ?
done
Akii
@Akii
Jan 20 2017 22:10
driver did reconnect?
Yorick Laupa
@YoEight
Jan 20 2017 22:10
so now, I can hit enter ?
yes
Akii
@Akii
Jan 20 2017 22:10
no hitting enter
Yorick Laupa
@YoEight
Jan 20 2017 22:10
it did reconnect
Akii
@Akii
Jan 20 2017 22:10
do it again
this is the last time :)
Yorick Laupa
@YoEight
Jan 20 2017 22:10
I did it 2 times already
Akii
@Akii
Jan 20 2017 22:10
perfect
so you can say it reconnected 3 times
now use this script
Yorick Laupa
@YoEight
Jan 20 2017 22:11
is it enought or should I do it one time again ?
Akii
@Akii
Jan 20 2017 22:11
es <- connect defaultSettings { s_retry = keepRetrying } (Static "localhost" 1113)
sub <- subscribeFrom es "strim" True (Just 18) (Just 500)
_ <- getLine
no hitting enter
start the server, start the script
observe connection; kill, restart, kill again, restart
Yorick Laupa
@YoEight
Jan 20 2017 22:11
you're to follow here
Akii
@Akii
Jan 20 2017 22:12
after the second kill the driver will fail to re-connect
Yorick Laupa
@YoEight
Jan 20 2017 22:12
for now I just have my getLine right after the connect call
Akii
@Akii
Jan 20 2017 22:12
inf lock occurs
Yorick Laupa
@YoEight
Jan 20 2017 22:12
so when I'm suppose to hit enter ?
Akii
@Akii
Jan 20 2017 22:12
never
halting the execution is just a means to an end
Yorick Laupa
@YoEight
Jan 20 2017 22:13
and why on the 2nd snippet, getLine is placed after subscribeForm call
?
Akii
@Akii
Jan 20 2017 22:13
that's what breaks the connection
Yorick Laupa
@YoEight
Jan 20 2017 22:13
slow down
Akii
@Akii
Jan 20 2017 22:13
:D
Yorick Laupa
@YoEight
Jan 20 2017 22:14
the first snippet involve getLine to be right after connect
so if I have to do nothing except kill and restart the server 2 times, how can I have getLine to be place after subscribeForm without killing the first program, recompile and launch it again
Akii
@Akii
Jan 20 2017 22:16
not following
Yorick Laupa
@YoEight
Jan 20 2017 22:16
likewise mate :-)
Akii
@Akii
Jan 20 2017 22:16
forget the first program
test is done
we're now at the second case
Yorick Laupa
@YoEight
Jan 20 2017 22:16
you show 2 different programs but I'm suppose to only kill and restart the server
Akii
@Akii
Jan 20 2017 22:16
okay :D
Yorick Laupa
@YoEight
Jan 20 2017 22:17
I'm missing a link here
Akii
@Akii
Jan 20 2017 22:17
1st program we showed that everything is ok
Yorick Laupa
@YoEight
Jan 20 2017 22:17
alright
so I kill it
Akii
@Akii
Jan 20 2017 22:17
after hitting enter you can close the connection and everythings all right
absolutely
Yorick Laupa
@YoEight
Jan 20 2017 22:17
edit the code then do the same thing to see the issue ?
Akii
@Akii
Jan 20 2017 22:17
now kill that program
Yorick Laupa
@YoEight
Jan 20 2017 22:18
alright
Akii
@Akii
Jan 20 2017 22:18
just forget about those two lines
we know they work
Yorick Laupa
@YoEight
Jan 20 2017 22:18
I'm slow it's friday night
Akii
@Akii
Jan 20 2017 22:18
now start the second one, with the subscription
np :D
whole different test case
so you run that, the eventstore says you're connected
now you kill the store and start it again
and then you do that again
Yorick Laupa
@YoEight
Jan 20 2017 22:19
ok
Akii
@Akii
Jan 20 2017 22:19
and then your program crashes
on the third start of the store the driver will not reconnect
keep in mind though that nothing happens with the actual program
it'll just keep waiting for your input there
but you can observe what the driver does through the store
afterwards you can hit enter and it will throw the exception
Yorick Laupa
@YoEight
Jan 20 2017 22:20
ok so what you mean is, in the 2nd program we no longer see the driver connect after kill and restart for the 2nd time ?
Akii
@Akii
Jan 20 2017 22:21
forgot that line: shutdown es >> waitTillClosed es
that basically throws
Yorick Laupa
@YoEight
Jan 20 2017 22:21
I don't have those
Akii
@Akii
Jan 20 2017 22:21
yes
Yorick Laupa
@YoEight
Jan 20 2017 22:21
ok
Akii
@Akii
Jan 20 2017 22:21
it doesn't matter
you can observe that the driver does not reconnect
Yorick Laupa
@YoEight
Jan 20 2017 22:21
I confirm the behavior then
Akii
@Akii
Jan 20 2017 22:21
nice
this is good so we can reproduce this
the interesting thing is that this happens after the subscription call
and mind you, only after catchup sub
I've not tried persistent subs but I did try regular sub
the latter one just kills the sub on connection lost
Yorick Laupa
@YoEight
Jan 20 2017 22:23
nice investigation man
Akii
@Akii
Jan 20 2017 22:23
which is fine, I personally would just turn that into a catchup sub but <details>
:D
Yorick Laupa
@YoEight
Jan 20 2017 22:23
so that's #61
Akii
@Akii
Jan 20 2017 22:23
I think 62 might be related
remotely
maybe
Yorick Laupa
@YoEight
Jan 20 2017 22:24
I switch it from investigation to bug
Akii
@Akii
Jan 20 2017 22:24
I saw :D
#62 is also very interesting
I'm kinda lost there because I don't know the inner workings
Yorick Laupa
@YoEight
Jan 20 2017 22:25
#61 is a pain in the ass though
:-)
Akii
@Akii
Jan 20 2017 22:25
ye
agreed
this is not your standard bug
the num you discard here
Yorick Laupa
@YoEight
Jan 20 2017 22:26
somehow #62 seems easy to solve
Akii
@Akii
Jan 20 2017 22:26
seems to be right
for whatever reason lastEventNumSM sm seems to be "outdated"
I've not put as much effort into solving this one though
Yorick Laupa
@YoEight
Jan 20 2017 22:27
let's get #61 solved first
Akii
@Akii
Jan 20 2017 22:27
agreed
especially since the 62 you need to handle anyways
not in that magnitude and nobody will ever kill your server but still :D
Yorick Laupa
@YoEight
Jan 20 2017 22:29
I think I find it out
Akii
@Akii
Jan 20 2017 22:31
next thing to try would be if the driver recovers after an event push from the store
Yorick Laupa
@YoEight
Jan 20 2017 22:36
it seems the Runner thread is still up after the 2nd kill-restart
Akii
@Akii
Jan 20 2017 22:36
how do you debug this?
Yorick Laupa
@YoEight
Jan 20 2017 22:36
it shouldn't
Akii
@Akii
Jan 20 2017 22:37
I looked briefly into the inner-inner workings of the driver
so I know there are like 3 threads
Yorick Laupa
@YoEight
Jan 20 2017 22:38
at line 364 in Database/EventStore/Internal/Execution/Production.hs I put this line case traceShow (worker, "Got killed" :: String) $ asyncExceptionFromException e of
I added a traceShow to see how the threads behave after each connection drop
the 2nd kill-restart Runner thread is still up
so I'll fix that first
λ yoeight [~/code/eventstore-bug] → stack exec eventstore-bug-exe
(Reader ThreadId 15,"Got killed")
(Writer ThreadId 17,"Got killed")
(Runner ThreadId 16,"Got killed")
(Reader ThreadId 25,"Got killed")
(Writer ThreadId 27,"Got killed")
Akii
@Akii
Jan 20 2017 22:40
trying to reproduce this
so strange
(Reader ThreadId 257,"Got killed")
(Writer ThreadId 259,"Got killed")
(Runner ThreadId 258,"Got killed")
(Reader ThreadId 262,"Got killed")
(Writer ThreadId 264,"Got killed")
(Runner ThreadId 263,"Got killed")
(Reader ThreadId 267,"Got killed")
(Writer ThreadId 269,"Got killed")
(Runner ThreadId 268,"Got killed")
(Reader ThreadId 272,"Got killed")
(Writer ThreadId 274,"Got killed")
(Runner ThreadId 273,"Got killed")
git diff
Yorick Laupa
@YoEight
Jan 20 2017 22:43
I spotted it
here's the bad code
Akii
@Akii
Jan 20 2017 22:44
now I'm curious :D
Yorick Laupa
@YoEight
Jan 20 2017 22:44
    -- Waits the runner thread to deal with its jobs list.
    atomically $ do
        end <- isEmptyCycleQueue _jobQueue
        unless end retrySTM
this never returns
on the 2nd kill-restart
which prevents closing function to complete
which mean Runner queue is not empty
Akii
@Akii
Jan 20 2017 22:47
oha
Yorick Laupa
@YoEight
Jan 20 2017 22:49
this is weirder than the gimp in pulp fiction movie !
Akii
@Akii
Jan 20 2017 22:49
no idea how you even end up there
what gimp
:D
I should re-watch that movie
Yorick Laupa
@YoEight
Jan 20 2017 22:49
this dude
Akii
@Akii
Jan 20 2017 22:49
hahhahaha
oh.. THAT dude
Yorick Laupa
@YoEight
Jan 20 2017 22:50
how I end up here ?
Akii
@Akii
Jan 20 2017 22:50
ye like I wasn't even able to reproduce
Yorick Laupa
@YoEight
Jan 20 2017 22:50
look at cruising function in Production.hs
we should receive a Stopped message
which kill the manager thread
closing is called at line 561
when the manager get kill
Akii
@Akii
Jan 20 2017 22:54
I see
Yorick Laupa
@YoEight
Jan 20 2017 22:57
spotted
That's bad though
Akii
@Akii
Jan 20 2017 22:58
oh oh
Yorick Laupa
@YoEight
Jan 20 2017 22:58
there a job that prevents runner thread to complete
when I hit ENTER
it works
Akii
@Akii
Jan 20 2017 22:59
not following
hit enter where
Yorick Laupa
@YoEight
Jan 20 2017 22:59
in the program we use to reproduce the bug
Akii
@Akii
Jan 20 2017 23:00
the second one?
Yorick Laupa
@YoEight
Jan 20 2017 23:00
yes
Akii
@Akii
Jan 20 2017 23:00
if you hit enter there I think it just stops
you need to shutdown >> wait
Yorick Laupa
@YoEight
Jan 20 2017 23:00
nope
Akii
@Akii
Jan 20 2017 23:00
that will throw
Yorick Laupa
@YoEight
Jan 20 2017 23:00
I don't have those
Akii
@Akii
Jan 20 2017 23:01
then I think something weird happens and the connection just keeps
remember when I got like 3 connections
Yorick Laupa
@YoEight
Jan 20 2017 23:01
I think that's the first phase of the catchup subscription which is at fault
I'll go to bed though
Akii
@Akii
Jan 20 2017 23:02
good idea :D
Yorick Laupa
@YoEight
Jan 20 2017 23:02
will resume maybe tomorrow, but surely on sunday
Akii
@Akii
Jan 20 2017 23:02
sorry I can't be as much of an assistance
I'll be there
Yorick Laupa
@YoEight
Jan 20 2017 23:02
man you don't how much you help me out here
much appreciated
Akii
@Akii
Jan 20 2017 23:03
hey you don't know how much you help me !
have a good night; we'll fix this next few days
Yorick Laupa
@YoEight
Jan 20 2017 23:03
I'm expecting to nuked that bug next time I'll be working on it !
good night
Akii
@Akii
Jan 20 2017 23:05
good night xD