These are chat archives for anacrolix/torrent

2nd
Jun 2017
Denis
@elgatito
Jun 02 2017 07:21
regarding restored torrents (with partial or completely downloaded), it takes so much time to check for all the pieces. I see the easiest way is to add setter func to Torrent , to change piece properties
something like struct{Complete, Partial, Checked}, without priority
Matt Joiner
@anacrolix
Jun 02 2017 09:05
hmmm, i'm surprised you don't get that from the piece completion
what are you seeing taking a long time? a profile is best
Denis
@elgatito
Jun 02 2017 09:09
I'm about restoring state of torrent after app restart
Matt Joiner
@anacrolix
Jun 02 2017 09:09
it should be quite fast if the state was retained in the piece completion db
Denis
@elgatito
Jun 02 2017 09:09
save states on close, then restore pieces stats on start
700 pieces file takes about 10 seconds to check
Matt Joiner
@anacrolix
Jun 02 2017 09:10
how big are the pieces?
Denis
@elgatito
Jun 02 2017 09:10
bolt completion is passed, database fills up with data
2mb
Matt Joiner
@anacrolix
Jun 02 2017 09:10
what do you mean it's passed?
Denis
@elgatito
Jun 02 2017 09:11
I mean: DefaultStorage: storage.NewFileWithCompletion(config.Get().DownloadPath, s.PieceCompletion),
Matt Joiner
@anacrolix
Jun 02 2017 09:12
oh right
Denis
@elgatito
Jun 02 2017 09:12
where s.PieceCompletion is storage.NewBoltPieceCompletion(config.Get().ProfilePath)
I see boltdb is getting bigger, so it's writing there
Matt Joiner
@anacrolix
Jun 02 2017 09:12
yeah i'ts hard to debug bolt, i don't know why they don't provide some good tools for reading its contents
can u get a cpu profile for restoring a torrent? are you seeing a lot of cpu, or is it a lot of disk?
Denis
@elgatito
Jun 02 2017 09:26
disk usage is low during checking
cpu is struggling
Matt Joiner
@anacrolix
Jun 02 2017 09:27
sounds like maybe the boltdb isn't working correctly
Denis
@elgatito
Jun 02 2017 10:53
         0     0% 33.88%      3.88s 16.70%  github.com/anacrolix/torrent.(*Torrent).setInfoBytes.func1
     0.01s 0.043% 33.92%      3.88s 16.70%  github.com/anacrolix/torrent.(*Torrent).verifyPiece
         0     0% 33.92%      3.87s 16.66%  github.com/anacrolix/torrent.(*Torrent).hashPiece
boltdb not even in top50
Matt Joiner
@anacrolix
Jun 02 2017 12:09
i meant that the piece completion data stored in boltdb might not be used
that definitely appears to indicate that it's hashing data
Denis
@elgatito
Jun 02 2017 12:11
getters/setters in bolt piececompletion are called
but making hash for each piece index takes that much time
Matt Joiner
@anacrolix
Jun 02 2017 12:12
try commenting out torrent.go lines 287 to 291
it looks like if you only had a few big files in your torrent, and u weren't near completion when you stopped the torrent, it will hash all the incomplete pieces again
even though they're all zeroed out data
let me know how you go without that goroutine running, it probably shouldn't be in there anymore
maybe this is one reason there's so much interest in pause/resume networking
other people might be triggering this cost, while i am not in my use case
Denis
@elgatito
Jun 02 2017 12:18
commenting those lines made pieces all marked with Checking = true, so can't progress
Matt Joiner
@anacrolix
Jun 02 2017 12:19
sorry, also comment out line 285
Denis
@elgatito
Jun 02 2017 12:20
yupp, already compiling
Denis
@elgatito
Jun 02 2017 12:34
double-checked with commenting out those lines. availability checks passes for less then second
before it was about a minute or more
if there's boltdb already (if it's really used on runtime), we can just store all hashes ([]byte) into boltdb
for 20-30gb torrents it will probably take more and more time?
Samuel
@iamacarpet
Jun 02 2017 12:46
Guys, if it helps, I was seeing this on a freshly created file, when it wasn't a resume.
Would you only expect it when it was a resumed download?
As writing a storage driver, I noticed it checks the pieces as they complete anyway, so there would be no reason to check the whole file as well if it's not a resume, would there?
(using @elgatito version of Quasar by the way, not using the library directly - didn't notice this with go-peerflix, although it doesn't report the progress as thoroughly).
Matt Joiner
@anacrolix
Jun 02 2017 14:09
yeah it appears it would hash any data present in pieces that aren't marked complete in the piece completion db. this would include on restarts presently. or even if the data was already there and hadn't been hashed before, it will check.
@elgatito just to confirm, the behaviour you're seeing with those lines commented out is what you wanted?
@iamacarpet the reason it currently checks on start up, is there's no way to tell if the data is already correct or not if someone else modified the files. it would seem reasonable to defer that decision to the user of the package perhaps. i'm seeing a use case for a "Torrent.VerifyData" or "Torrent.VerifyPiece(int)" looking good
Samuel
@iamacarpet
Jun 02 2017 14:20
@anacrolix that sounds reasonable to me, am I right to think it'll check the piece again as it is read by the reader anyway, or does it only check the piece on completion of the piece download?
This might be something required for the work @elgatito wants to do creating an in-memory only storage driver, containing only a small subset of pieces at a time (100-200MB limit), to allow streaming without the need for temporary files or backing storage. Although I suppose the storage driver returning no for all those pieces would be fairly quick if it was just checking an in memory map anyway...

Oh and regarding:

it would hash any data present in pieces that aren't marked complete in the piece completion db.
The reason I wasn't seeing this with go-peerflix is because I only tried it with the FAT32 compatible driver, where each piece was it's own file, so the check was really fast for pieces that haven't been downloaded, as it would get file not exist. But since the standard file driver is doing a pre-allocate when it is created (isn't it?), it'll read all 0s while checking (I assume).

Denis
@elgatito
Jun 02 2017 14:43
@anacrolix , not sure about other piecestate values and also I had not touched files, so no need to verify
Imho, there are no easy ways to verify files. Maybe a loadstate/savestate import/export for whole torrent, to call manually and do whatever you want
Denis
@elgatito
Jun 02 2017 15:04
As long as we already have information about piece completion it only needs a check to verify the whole file is the same as it was previously on close. That would be enough and less painful, probably
Matt Joiner
@anacrolix
Jun 02 2017 15:43
what you mention about pieces being their own file avoiding this issue is spot on.
yeah i think i'll remove the up front check. a long time ago, the client used to lazily hash pieces, but that caused weirdness with some users wanting to know up front how much data they had. "i haven't looked" doesn't work so well for them.