These are chat archives for facebook/wdt

27th
Aug 2015
John Bresnahan
@buzztroll
Aug 27 2015 07:11

GridFTP was designed to use parallel TCP streams. It was later extended to use UDT but that never became a popular option. There are C libraries for both the client and server side. However it started out as a research project and many white papers were published on it, thus I was asking if you were familiar with those concepts and curious if they helped in this effort at all?

You mentioned that you do not have to traverse NATs which has me more curious about the target for this protocol. At first I was assuming that this was for wide area transfer to general internet users, but now I think it is for local area transfer for internal components. This is a very interesting effort and I would love ot read more about it. Can you point me at anything?

Thanks for the reply!

Laurent Demailly
@ldemailly
Aug 27 2015 17:48
I came up with the idea independently even though I see it's not a new idea - been buried heads down in the the industry for many years so I haven't kept up with research. We use it inside facebook and most of the value is when going across data centers - we have data centers for instance in Sweden and West coast US - transferring data across wide area network is something useful to optimize. Relatively high latency (>100-200ms) means our streaming helps. We don't really have papers or even blog posts describing the use case no - not yet (for now we use it and publish the code).
John Bresnahan
@buzztroll
Aug 27 2015 19:34
If your data sets are large it is a very similar use case. We dealt with large scientific data sets (sometimes partitioned into many small files) across wide area fast networks. I recommend looking into that research, and those of others at the time (Tsunami, bbcp, SRB). You could really benefit from the lessons learned even if you want to roll your own software.
Laurent Demailly
@ldemailly
Aug 27 2015 20:39
Agreed, do you have a pointer to said lessons learned ?
(or starting point - ideally without having to read too many papers ;-) )