These are chat archives for rust-lang/rust

7th
Nov 2016
Matt Brubeck
@mbrubeck
Nov 07 2016 17:05
Transmuting a Vec wouldn't work correctly in this case because the length and capacity would become wrong
To re-interpret it correctly you could use Vec::as_ptr and slice::from_raw_parts, making sure to adjust the length
David Harvey-Macaulay
@alteous
Nov 07 2016 19:35
@mbrubeck That's precisely what I'm looking to achieve, however my concern is that the re-interpreted data will not be correctly aligned when reading 2 or 4 bytes at a time. I am coming from a C background and I don't know if Rust takes care of this detail already.
Matt Brubeck
@mbrubeck
Nov 07 2016 19:45
@Alteous That's a valid concern. In practice I'd expect Vec's allocation to always be on a 4-byte boundary on the platforms I use, but if you want this guaranteed then you could either use the alloc APIs, or start by allocating a Vec<u32>.
David Harvey-Macaulay
@alteous
Nov 07 2016 20:13
@mbrubeck If that's the case then it will save a lot of hassle! I'll use the code below if I run into problems (which admittedly is unlikely on a modern CPU) but for now maybe its just worth throwing in a run time assertion assert!((buffer.as_ptr() as usize) % 4 == 0), which isn't failing for me. Thanks for your help!
fn main() {
    let path = "data.bin";
    let n_bytes = 1024;
    let mut file = std::fs::File::open(path).expect("File not found");
    let mut buffer = Vec::<u32>::with_capacity(1 + (n_bytes / 4));
    unsafe {
        use std::io::Read;
        let mut slice = std::slice::from_raw_parts_mut(buffer.as_mut_ptr() as *mut u8, n_bytes);
        file.read_exact(&mut slice[..]).expect("I/O error");
    }
}
Sebastian Blei
@iamsebastian
Nov 07 2016 20:14
Any hint, what is the best way to do some calculations on a set of up to some million postgres or CSV rows? The most comfortable for sure is to interprete them as a struct, iterate over them and do some mappings. But I don't think it is notable performant.
Sebastian Blei
@iamsebastian
Nov 07 2016 20:21
These rows contain data like UUIDs, text, bytecount, filename, dirname, etc. I need to extract some information out of them like: If bytecount is about that, than do this, join files in directory based on dirname, etc.
Jonas Platte
@jplatte
Nov 07 2016 20:53
@Alteous One thing to note about your code is that the content of the last u32 won't be well-defined if n_bytes % 4 != 0, you might want to address that.
(unless Vec::with_capacity initializes the memory it allocates, but I doubt it does)
David Harvey-Macaulay
@alteous
Nov 07 2016 21:09

@jplatte Well spotted, thanks for pointing out my oversight. I doubt the memory is initialised too. For now I'll take the easy route with:

let mut buffer = Vec::<u8>::with_capacity(n_bytes);
assert!((buffer.as_ptr() as usize) % std::mem::align_of::<u32>() == 0);

Or, as a last resort:

let mut buffer = alloc::heap::allocate(n_bytes, std::mem::align_of::<u32>());