These are chat archives for anacrolix/torrent

28th
Dec 2017
Matt Joiner
@anacrolix
Dec 28 2017 04:56
@elgatito What is "Positions:" in the log?
Okay fantastic, I think you're headed in the right direction, your guess is the same as mine, let do some confirmation
let me do*
Why is there no "error reading torrent" in your kodi log files?
Matt Joiner
@anacrolix
Dec 28 2017 05:07
Also look at Torrent.onIncompletePiece. It's possible we have have already told peers we had a piece, then it's been dropped from the cache and they're ignoring requests for it
It's strange that all these pieces that stop, are in the readahead priority. The user shouldn't notice that unless their reads are stalled on it, which they shouldn't be unless the piece priority was "now", which should look like 1! in the client status
Denis
@elgatito
Dec 28 2017 05:23
Torrent.onIncompletePiece
I do see this is called and priorities are changed, so the piece comes up in the "next pieces" list, but can't see where chunks are queued in this case
It's strange that all these pieces that stop, are in the readahead priority
In my case, I make a reader that is responsible for ongoing reading/downloading, which is about 30% of memory, allocated for that torrent. So for streaming it downloads pieces from the tail of a reader
but there are situations when many pieces need to be redownloaded, for example audio track change, or simple video seek. More interesting why this situation is not happening all the time when we redownload pieces
Denis
@elgatito
Dec 28 2017 05:32
02:13:43 T:1628 NOTICE: [plugin.video.elementum] DEBU memory ▶ ReadAt No contain read: 97 02:13:43 T:1628 NOTICE: [plugin.video.elementum] DEBU memory ▶ MarkNotComplete NotComplete: 97 02:13:43 T:1628 NOTICE: [plugin.video.elementum] error reading torrent "The Thinning (2016) HDRip" piece 97 offset 0, 9778 bytes: unexpected EOF
this is from this log - http://dropmefiles.com/B8lsn
02:13:43 T:1628  NOTICE: [plugin.video.elementum] DEBU  memory       ▶ ReadAt           No contain read: 97
02:13:43 T:1628  NOTICE: [plugin.video.elementum] DEBU  memory       ▶ MarkNotComplete  NotComplete: 97
02:13:43 T:1628  NOTICE: [plugin.video.elementum] error reading torrent "The Thinning (2016) HDRip" piece 97 offset 0, 9778 bytes: unexpected EOF
so here, first my storage says buffer for piece is allocated, but it does not contain requested chunk, and returns an error to the client, the getting a request to mark piece as NotComplete, then logging an "error reading torrent" from the reader and updates priorities and so on
in this specific log somehow Client think the chunk is there, that is why I came to dirtyChunksmanagement
Denis
@elgatito
Dec 28 2017 05:51

which they shouldn't be unless the piece priority was "now", which should look like 1! in the client status

oh, there are two situations with the same error in that log, piece 97 and piece 134

but you see, one is then errored by a reader, and one (try to grep by "No contain read") is errored by dropping first corresponding conn from trust
and the first piece was later downloaded successfully, the second then failed. Error comes from torrent.pieceHashed
Denis
@elgatito
Dec 28 2017 05:56
okay, I was looking at the wrong log position. This one is related to stalled piece:
02:12:33 T:1628  NOTICE: [plugin.video.elementum] DEBU  memory       ▶ ReadAt           No contain read: 134
02:12:33 T:1628  NOTICE: [plugin.video.elementum] DEBU  memory       ▶ MarkNotComplete  NotComplete: 134
02:12:33 T:1628  NOTICE: [plugin.video.elementum] dropping first corresponding conn from trust: [62 76 87 98 117 124]
02:12:34 T:1628  NOTICE: [plugin.video.elementum] DEBU  memory       ▶ MarkComplete     Complete: 134
and that is interesting, requesting read - then marking piece NotComplete, but next request is to set it Complete, and the next second Client status shows this piece in the queue
Denis
@elgatito
Dec 28 2017 06:36
Now looking at this log - https://drive.google.com/file/d/1GWvj7xHcafOZUUNS0gx2s2eH95XtJ2z6/view
and i'm stuck. Complete: 4785 we got a Complete state and in a 0.5 second this piece appears on the "next pieces" queue, and there are no errors at all
Matt Joiner
@anacrolix
Dec 28 2017 07:15
Have you compared your custom storage implementation to say the file or piece implementations?
In particular, I wonder if your implementation is correctly returning the Completion/Ok stuff correctly, perhaps there's some timing issue that I haven't seen with my own storage stuff
Denis
@elgatito
Dec 28 2017 07:16
I was looking at filecache implementation
for Complete/NotComplete I just set Piece field to true/false and return it for Completion
Matt Joiner
@anacrolix
Dec 28 2017 07:17
I've tried putting asserts in the chunk request algorithm (like trying to request pieces that have all dirty chunks), but it seems to fail without any reason I can explain. I did a lot of testing on that algorithm tho
hm that looks quite reasonable
Denis
@elgatito
Dec 28 2017 07:20
I have dropped many LOC, coz no need to save states or something like that. Client is doing verification, if sets to Complete then it's complete, when I drop off the piece, I reset Size+Completed and Chunks state
Matt Joiner
@anacrolix
Dec 28 2017 07:21
I'm not sure you want Piece to implement Write like that do you?
Denis
@elgatito
Dec 28 2017 07:21
should be straight, but you see, there should be some basic error, maybe functions run in parallel
Matt Joiner
@anacrolix
Dec 28 2017 07:21
yeah
Denis
@elgatito
Dec 28 2017 07:22
Piece is called by a SectionReader, so Write is never called, but needed for an interface
Matt Joiner
@anacrolix
Dec 28 2017 07:22
what we know is that the client believes it needs a piece, but isn't requesting any chunks from it
Denis
@elgatito
Dec 28 2017 07:23
Now looking at this log - https://drive.google.com/file/d/1GWvj7xHcafOZUUNS0gx2s2eH95XtJ2z6/view
and i'm stuck. Complete: 4785 we got a Complete state and in a 0.5 second this piece appears on the "next pieces" queue, and there are no errors at all
this one is confusing, got a Complete request and then it comes to the queue
Matt Joiner
@anacrolix
Dec 28 2017 07:24
ok, so if the piece is marked complete, but isn't removed or filtered from pending pieces for some reason
What is Positions: ?
Denis
@elgatito
Dec 28 2017 07:27
Looking what Pieces are stored in memory
buffer_id <> piece index
Matt Joiner
@anacrolix
Dec 28 2017 07:30
MarkComplete is only called in one place in the torrent code which will make this easier
Immediately after completing a piece, it will call Completion() on the piece, is it possible the length differs or something else is changed?
In your MarkComplete, assert that the length is actually what's expected when it's called
p.Active && p.Completed && p.Size == p.Length && p.Length != 0
check that's true immediately after you set Completed=true
i'd also make Piece.Write panic, tho it's probably unrelated
Denis
@elgatito
Dec 28 2017 07:33
return an error from MarkComplete() if I believe it's not yet complete?
Matt Joiner
@anacrolix
Dec 28 2017 07:33
no, panic
it's a logical error if it's not actually complete and the torrent thinks that it should be. at least as far as we currently believe
Denis
@elgatito
Dec 28 2017 07:34
to see the stacktrace?
Matt Joiner
@anacrolix
Dec 28 2017 07:35
yes, and the fact that nobody should have to handle that error
go programmers are a little paranoid about panic, it's unfortunate :P
one possible series of events (without knowing how your backend caching works), is that we do a series of writes to the piece from torrent, then we call MarkComplete, it gets thrown out of the cache, then we call Completion and we get a different value than we expected
that would actually break torrent, it currently expects no change between those 2 events i think
Denis
@elgatito
Dec 28 2017 07:38
I have placed locks there so be sure we don't return something before setting Completed
Matt Joiner
@anacrolix
Dec 28 2017 07:38
on torrent.go line 1031, the if changed test could be commented out. it may affect performance but it would also ignore any error introduced by pieces vanishing for this case i think
i don't know how easy it is for you to reproduce this, if you have to wait for users to trip on it, that's unfortunate
Denis
@elgatito
Dec 28 2017 07:40
the problem is that there is no use case to reproduce, happen from time to time, same torrent, same operation leads to different output
so yes, i will make a new version with a panic so we can see what changes there
Matt Joiner
@anacrolix
Dec 28 2017 07:41
i'd also try with those lines commented out
the information you'd get from those changes wouldn't overlap
Denis
@elgatito
Dec 28 2017 08:37
could it be related there are more than 1 active reader at the moment of a lag?
and they do run update* functions in parallel
Denis
@elgatito
Dec 28 2017 09:11
looking again over the log from gdrive. there around piece 4785, a message comes: dropping first corresponding conn from trust: [-1 193 264 308 491 654], and later Complete: 4785 is called
message comes from pieceHashed with correct == false, which is fine
and again we come to torrent.onIncompletePiece
so we've come to pieceHashed there twice, first time with correct == false and later correct == true
Denis
@elgatito
Dec 28 2017 09:24
and one-by-one, verifyPiece->pieceHashed is called from queuePieceCheck, which call from torrent is not releavant, and we have one from piece and another from connection
    // It's important that the piece is potentially queued before we check if
    // the piece is still wanted, because if it is queued, it won't be wanted.
    if t.pieceAllDirty(index) {
        t.queuePieceCheck(int(req.Index))
    }
Matt Joiner
@anacrolix
Dec 28 2017 09:39
i don't follow your last observation
Denis
@elgatito
Dec 28 2017 09:41
I mean we come to pieceHashed two times, first with correct == false, second with correct == true, and pieceHashed is called from verifyPiece which is called from connection.go
sorry, verifyPiece is called from t.queuePieceCheck, which is called from connection.go
Matt Joiner
@anacrolix
Dec 28 2017 09:45
do you think that combination is causing a problem?
Denis
@elgatito
Dec 28 2017 09:49
no, just trying to understand, when it's one place to trigger pieceHashed how it can be called , and this t.pieceAllDirty(index) is also true to pass
Matt Joiner
@anacrolix
Dec 28 2017 09:50
yep i follow, well spotted
notice how the chunk is marked as dirty (unpendpiecechunk), then the lock is released, then after it's written, it checks for hashing
Denis
@elgatito
Dec 28 2017 09:50
what happens if verification fails? if we suppose we have a piece but hash is not correct? should it redownload in a whole?
Matt Joiner
@anacrolix
Dec 28 2017 09:51
it does currently redownload the whole thing, as it has no way to know which chunk was bad
that happens due to bad/dishonest peers
i don't like that potential race there now tho
it's probably not incorrect, but i'm sure there's a more robust way to do it
Denis
@elgatito
Dec 28 2017 09:52
i feel it should be something with locks or chunks counters. there is nothing else, really to break clauses
Matt Joiner
@anacrolix
Dec 28 2017 09:52
all of this is speculative
Denis
@elgatito
Dec 28 2017 09:53
yup, waiting for new logs with panic added
Matt Joiner
@anacrolix
Dec 28 2017 09:53
sweet
Denis
@elgatito
Dec 28 2017 09:54
not sure someone on linux had this issue, I remember only users from android/windows
but that is probably because most of users are on android/windows :)
Matt Joiner
@anacrolix
Dec 28 2017 09:54
ok, well i only run on linux and macos
yeah that's most likely the reason :P
Denis
@elgatito
Dec 28 2017 10:02
is t.pendingPieces re-filled when need to re-download?
I see pendingPieces.Remove() when piece is completed, but can't see it's re-added is something happen
Matt Joiner
@anacrolix
Dec 28 2017 10:04
Torrent.pendPiece
Denis
@elgatito
Dec 28 2017 10:06
yes, I see it's called on initialization and Download* calls, but don't see other places to return pieces back
Matt Joiner
@anacrolix
Dec 28 2017 10:08
yeah you're right
Denis
@elgatito
Dec 28 2017 10:08
if that influences anything really
Matt Joiner
@anacrolix
Dec 28 2017 10:08
not in your use case, but it isn't ideal
DownloadAll would not reattempt pieces that have failed it woudl appear
same with DownloadPieces
nah, it only ever removes a piece if it's completed successfully
if it's lost after that, it's the storage's problem, not the clients
Matt Joiner
@anacrolix
Dec 28 2017 10:25
i can see a few areas to improve the client, thanks to your observations
i think i'll move the write concurrency into the storage layer
so the client is more consistent
Denis
@elgatito
Dec 28 2017 11:36
have got a first report with panic - http://dropmefiles.com/MC74q
Denis
@elgatito
Dec 28 2017 11:56
nothing helpful there, panic starts from where we already know
15:59:48.247 T:501857920080  NOTICE: [plugin.video.elementum] created by github.com/anacrolix/torrent.(*Torrent).queuePieceCheck
15:59:48.251 T:501857920080  NOTICE: [plugin.video.elementum]     /ext-go/2/src/github.com/anacrolix/torrent/torrent.go:1602 +0xb8
Matt Joiner
@anacrolix
Dec 28 2017 12:06
i thought that's what we wanted. it says it panicked in MarkComplete, which means it's calling that and the later Completion() will return false
Denis
@elgatito
Dec 28 2017 12:07
@anacrolix , i thought it will give more stack info about what how we got there
Matt Joiner
@anacrolix
Dec 28 2017 12:07
yeah it does
15:59:48.228 T:501857920080 NOTICE: [plugin.video.elementum] panic: piece is not completed
15:59:48.232 T:501857920080 NOTICE: [plugin.video.elementum]
15:59:48.234 T:501857920080 NOTICE: [plugin.video.elementum] goroutine 122411 [running]:
15:59:48.236 T:501857920080 NOTICE: [plugin.video.elementum] github.com/elgatito/elementum/storage/memory.(*Piece).MarkComplete(0x4420258cb0, 0x0, 0x0)
15:59:48.241 T:501857920080 NOTICE: [plugin.video.elementum] /ext-go/1/src/github.com/elgatito/elementum/storage/memory/piece.go:52 +0x16c
15:59:48.242 T:501857920080 NOTICE: [plugin.video.elementum] github.com/anacrolix/torrent.(*Torrent).pieceHashed(0x44200c6000, 0x263, 0x1)
15:59:48.244 T:501857920080 NOTICE: [plugin.video.elementum] /ext-go/2/src/github.com/anacrolix/torrent/torrent.go:1475 +0xfc
15:59:48.245 T:501857920080 NOTICE: [plugin.video.elementum] github.com/anacrolix/torrent.(*Torrent).verifyPiece(0x44200c6000, 0x263)
15:59:48.246 T:501857920080 NOTICE: [plugin.video.elementum] /ext-go/2/src/github.com/anacrolix/torrent/torrent.go:1571 +0x220
15:59:48.247 T:501857920080 NOTICE: [plugin.video.elementum] created by github.com/anacrolix/torrent.(*Torrent).queuePieceCheck
15:59:48.251 T:501857920080 NOTICE: [plugin.video.elementum] /ext-go/2/src/github.com/anacrolix/torrent/torrent.go:1602 +0xb8
that's heaps
Denis
@elgatito
Dec 28 2017 12:08
i'm now trying another approach, reversing from the other side, reader's one
so if the piece is in the "next pieces" then why it's not downloaded
there is a connection.desiredRequestState(), which i'm trying to understand
and this piece of code:
        func(piece int, f func(chunkSpec) bool) bool {
            return undirtiedChunks(piece, cn.t, f)
        },
Matt Joiner
@anacrolix
Dec 28 2017 12:13
in pieceHashed, it calls MarkComplete. immediately after that, it called updatePieceCompletion, which calls pieceCompletedUncached. that last call, is not returning complete==true, as shown by the panic, marking it complete does not mean it will be complete in that later call. if that call doesn't return true, updatePieceCompletion breaks the state of the piece. it decides that "changed" is false, and leaves the piece with all chunks dirty, and still pending.
the true error is that your MarkComplete is not actually guaranteeing that the later Completion call is true
but there are plenty of poor coding choices in the client implementation that are letting this happen and not coping with it
Denis
@elgatito
Dec 28 2017 12:16
what if I do set p.Ok = p.Active && p.Size == p.Length && p.Length != 0 in MarkComplete
then in Completion I will return it ?
not the static true
so if MarkComplete is failing, I save that state and later when Completion is called - I return Ok this says something is wrong and triggers statechange chain?
Matt Joiner
@anacrolix
Dec 28 2017 12:21
i think your original point about returning an error is the best step for now. it's imperative that markcomplete succeeding means that the following call to Completion returns Complete: true, Ok: true
so if the piece isn't actually complete when you call markcomplete for whatever reason, tell the client there by returning an error immediately.
in a pure memory storage backend, Ok should always be true as well. Ok represents the state as known, the state can't be unknown in memory
i'm going to modify updatePieceCompletion to not poll completion a second time, removing the possibility of that bug altogether
wow, clever gitter :)
so we check is it really completed for markcomplete(), if not then we don't set Completed, and we explicitly return p.Completed for Completion
Matt Joiner
@anacrolix
Dec 28 2017 12:54
definitely the MarkComplete looks correct
i don't know enough about your implementation for Completion
i've also tracked down a single line "fix" for this in anacrolix/torrent
Denis
@elgatito
Dec 28 2017 12:55
month of struggles for a one-line-change :D
Matt Joiner
@anacrolix
Dec 28 2017 12:55
:D
Denis
@elgatito
Dec 28 2017 12:56
will you push it?
Matt Joiner
@anacrolix
Dec 28 2017 12:56
yeah i was going to try it in production on my end first
i'll push it
Denis
@elgatito
Dec 28 2017 12:59
great. let me know, when you make a change, please. so I will trigger release and ask people to do testing
Matt Joiner
@anacrolix
Dec 28 2017 12:59
i'd hate to get your hopes up, but i think it should help
anacrolix/torrent@898db89
i'm pushing it to prod on my end now
Denis
@elgatito
Dec 28 2017 13:16
thanks! making a new version for test
Denis
@elgatito
Dec 28 2017 13:29
btw, UserAgent setting works fine. tested on a closed tracker, they need UserAgent+PeerID to match and to belong to allowed list
Denis
@elgatito
Dec 28 2017 16:19
First report of a stop while changing the audio track http://dropmefiles.com/m4NCb
On windows
I have not seen it for the moment
Denis
@elgatito
Dec 28 2017 18:08
hm.. somehow piece length is 909737984 or 609730560 or other strange values, will debug own code
overviewing that log, two pieces get error with not matching the piece length and size increment is pretty straight:
func (p *Piece) WriteAt(b []byte, off int64) (n int, err error) {
        ...
        p.Size += int64(n)
    return
}
and that is on windows_x86
and n = copy(p.c.buffers[p.Position][off:], b[:])
Denis
@elgatito
Dec 28 2017 18:15
interesting how size can become that value, assuming, Size is truncated on delete and is set to 0 on start
probably, I will try with additional re-calculation of size on size mismatch, but I have a feeling it can be related to 32-bit system
/somehow/
Denis
@elgatito
Dec 28 2017 18:35
Not contains read: 139, Stats: 1474560-2097152, Completed: false, Chunk: 3 (2-3), Request: 32768, len(32768)
aha.. so some moment MarkComplete failed and I had to reset size and chunks index, because client will redownload whole piece
Denis
@elgatito
Dec 28 2017 19:04
but there is one positive thing , piece is not stuck, but just downloaded and downloaded again
so have a hope it will start working
Matt Joiner
@anacrolix
Dec 28 2017 23:29
Excellent