Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Vincent Cantin
@green-coder
To fight that, you can also add some randomly sized trash after the nonce.
Dawid Ciężarkiewicz
@dpc
This has a negative drawback, that chunks are not deterministic.
So if you hash the same data twice ... so you will get different data file, and deduplication doesn't work.
Vincent Cantin
@green-coder
You are right. The nonce should be a salt value which you keep with the repo's secret.
Dawid Ciężarkiewicz
@dpc
Then you will have to specify password to add more data.
Vincent Cantin
@green-coder
maybe a hash of the password.
I see.
Dawid Ciężarkiewicz
@dpc
Then to automate it, you'd have to write down that password in clear text somewhere anyway. Though it would protect eg. your off-side backups.
Vincent Cantin
@green-coder
So you want a way to have the program which adds chunks to the repo know if the chunks are already there, but at the same time you don't want Alice or Bob to do the same. It looks like impossible.
Dawid Ciężarkiewicz
@dpc
After spending some time thinking about it, I just figured out thati it's not a big deal, and i prefer bigger chunks anyway.
Yeah, it looks impossible to me too. :)
Vincent Cantin
@green-coder
Unless there is a secret used when you add data. Like a secret salt.
Dawid Ciężarkiewicz
@dpc
Though if you figured out something, I'd be delighted to know about it.
Vincent Cantin
@green-coder
If the salt is used only on the computer which adds data, it is still a practical solution.
Dawid Ciężarkiewicz
@dpc
In my backup setup, every computer uses rdup + rdedup to backup it's own data, which all lands in one directory synced by syncthing
Vincent Cantin
@green-coder
@dpc If I use something like leveldb or sqllite to store the chunks and nodes, would it sync in your use case ?
Also, I am thinking that syncthing is probably chunking and hashing the files prior to send them over the network. It is a kind of waste, knowing that we have the information for that already. It would be better if we can propagate the data by ourselves.
At some point, I might implement that system.
Dawid Ciężarkiewicz
@dpc
Hah!
I'm not sure if Dropbox / Syncthing does any deduplication during transmission.
I would bet it doesn't
P2p sync is bigger project than dedup backups
Vincent Cantin
@green-coder
According to its documentation, it is using fixed sized blocks with the BEP protocol, so they are hashing each of those blocks.
I think that BEP was not designed for large files which are intended to change.
Joel Reymont
@wagerlabs
This message was deleted
j
@whatiselsethere_twitter

Hi @dpc ! Question about rdedup:

CHUNK_SIZE setting only work if CHUNKING is present
So if i write rdedup init --chunk-size 15M
(not rdedup init --chunking bup --chunk-size 15M)
– it has no effect.
It is intentional?

Dawid Ciężarkiewicz
@dpc
Looks like a bug to me
I haven't thought it through. :)
Dawid Ciężarkiewicz
@dpc
@whatiselsethere_twitter Can you please report on github? I should get this fixed. Thanks for bringing it up!
Also, rdedup has it's own gitter channel.
j
@whatiselsethere_twitter
@dpc will do.
Thank you!
Krzysztof Szumny
@noisy
@dpc hej, tak z ciekawości, to Twoje konto? https://steemit.com/blog/@dpc/this-is-just-a-test-please-ignore
Dawid Ciężarkiewicz
@dpc
Tak jest
Ruben De Smet
@rubdos
@dpc: you mean this gitter channel? :-)
Dawid Ciężarkiewicz
@dpc
@rubdos Exactly
Ruben De Smet
@rubdos
good. I'll have more time tomorrow. It's getting late. :-)
James Sewell
@jamessewell
Hi dpc. I’m about to use your fastcdc implementation to replace adler32 rolling
Would I just use RollingHash?
.roll then .digest?
Dawid Ciężarkiewicz
@dpc
Which crate are you looking at?
There's a bit of a mess right now, and rdedup has it's own subcrate
Damn it. I've lost track of where do I have a source code for rdedup-cdc
Wonderful :D
Dawid Ciężarkiewicz
@dpc
Benchmarks show how to use. Another place is rdedup.
James Sewell
@jamessewell
Oh right I was looking in another place!
Will have a peek
Dawid Ciężarkiewicz
@dpc
Ping
matrixbot
@matrixbot
@dpc:matrix.org Perfect