These are chat archives for rust-lang/rust

27th
Jul 2018
tsoernes
@tsoernes
Jul 27 2018 00:32
Is there any way to get the type system to help me in this case: Consider a binary tree where a node either a) is a leaf node and has a value b) does not have a value, has a left child c) does not have a value, has a right child d) does not have a value, has left&right children
tsoernes
@tsoernes
Jul 27 2018 01:49
Any tips for this one? crate ndarray:
fn split<'a>(
    x: ArrayView1<'a, f64>,
    y: ArrayView1<'a, f64>,
) -> (
    ArrayView1<'a, f64>,
    ArrayView1<'a, f64>,
    ArrayView1<'a, f64>,
    ArrayView1<'a, f64>,
) {
    let m = x.len() / 2;
    // `x` does not live long enough (borrowed value does not live long enough) (rust-cargo)
    // `y` does not live long enough (borrowed value does not live long enough) (rust-cargo)
    return (
        x.slice(s![..m]),
        y.slice(s![..m]),
        x.slice(s![m..]),
        y.slice(s![m..]),
    );
}
Michal 'vorner' Vaner
@vorner
Jul 27 2018 07:13

@tsoernes You certainly could define the node as an enum, something like:

enum Node<T> {
    Value(T),
    Left(Box<Node<T>>),
    Right(Box<Node<T>>),
    Both(Box<Node<T>>, Box<Node<T>>),
}

Not sure if it's not an overkill or if leaf/inner distinction with options for nodes would be easier to work with.

@nyarly The declaration is not saying that HeaderMap is a newtype/wrapper around HeaderValue. This thing says that HeaderMap can take whatever T for its values and that the dafult, if you don't specify which T, is HeaderValue. But it's still a map of them, not alias.
tsoernes
@tsoernes
Jul 27 2018 07:16
yes. I was a bit incorrect in my spec. Im doing decision trees and ended up with this, which seems ok so far:
enum TreeNode {
    Leaf {
        value: f64,
    },
    Node {
        feature_idx: usize,
        threshold: f64,
        left: Box<TreeNode>,
        right: Box<TreeNode>,
    },
}
Michal 'vorner' Vaner
@vorner
Jul 27 2018 07:17
Maybe you want Option<Box<TreeNode>> for the children (so there may be missing, if it is allowed)
tsoernes
@tsoernes
Jul 27 2018 07:21
the spec I gave turned out incorrect for my problem; in decision trees there's always 2 or none child nodes (if it's none then its a leaf with a value)
Michal 'vorner' Vaner
@vorner
Jul 27 2018 07:32
Right, then your's should work fine.
Denis Lisov
@tanriol
Jul 27 2018 07:48
@nyarly IIRC, this means "with a parameter T that, if not mentioned, defaults to HeaderValue"
Denis Lisov
@tanriol
Jul 27 2018 08:02
@tsoernes Either ArrayView::split_at or ArrayView::reborrow plus slice_move
tsoernes
@tsoernes
Jul 27 2018 13:00
@tanriol Thanks, split_at did the trick for that example. Do you have any idea how to split the view with a filter, without copying?
That is, the numpy equivalent of X[np.where(X[:, i] < thresh)]; X[np.where(X[:, i] >= thresh)] which will create two views of a 2D array, 1 with rows where column i is less than a threshold, and another view for the rest of the elements. It is possible to map the 2D array and generate two arrays of indecies --- one with rows of columns less than the threshold, and one with the rest --- and then use select, however that creates a copy of the array
tsoernes
@tsoernes
Jul 27 2018 13:12
Take a look:
/// Split `x` into two subviews: one with the rows where the value in the `feature_idx` column is below `threshold`,
//  and one where the value is equal or greater
fn split2<'a>(
    x: ArrayView2<'a, f64>,
    feature_idx: usize,
    threshold: f64,
) -> (ArrayView2<'a, f64>, ArrayView2<'a, f64>) {
    // Any way to do both of these in 1 pass?
    let idxs_lt: Vec<usize> = x.outer_iter()
        .enumerate()
        .filter(|(i, e)| e[[feature_idx]] < threshold)
        .map(|(i, e)| i)
        .collect();
    let idxs_gt: Vec<usize> = x.outer_iter()
        .enumerate()
        .filter(|(i, e)| e[[feature_idx]] >= threshold)
        .map(|(i, e)| i)
        .collect();
    let xl = x.reborrow().select(Axis(0), &idxs_lt);
    let xr = x.reborrow().select(Axis(0), &idxs_gt);
    // mismatched types (expected struct `ndarray::ViewRepr`, found struct `ndarray::OwnedRepr`) (rust-cargo)
    // expected type `ndarray::ArrayBase<ndarray::ViewRepr<&'a f64>, _>`
    // found type `ndarray::ArrayBase<ndarray::OwnedRepr<f64>, _>` (rust-cargo)
    // I don't want to copy the values in x
    (xl, xr)
}
you could of course do .view() after the select, but that creates a copy which I do not want
Ichoran
@Ichoran
Jul 27 2018 14:00
Why not build up both Vecs as you go?
Wait, why don't you want to want to copy the values in x?
Anyway, you can manually build the Vecs in a for.
Ichoran
@Ichoran
Jul 27 2018 14:05
Or you can use partition (for the first part): https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.partition
Zakarum
@omni-viral
Jul 27 2018 14:58
How can I check that u64 value is within 0 .. usize::MAX?
for value: u64 value <= usize::MAX as u64 doesn't work as usize::MAX are not guaranteed to fit into u64
Lyle Mantooth
@IslandUsurper
Jul 27 2018 15:03
@omni-viral, I think there are cfg flags for figuring out the size of usize. If usize is u64 or smaller, what you wrote works, otherwise . . . well, you don't need a check, because it's smaller by definition.
Err, I may have that backwards.
Zakarum
@omni-viral
Jul 27 2018 15:04
No. This is correct
But I guess that if usize is u64 no check if require either
Lyle Mantooth
@IslandUsurper
Jul 27 2018 15:08
#[cfg(target_pointer_width = "32")] May be all you need. The reference only implies 32 and 64 as valid values.
Zakarum
@omni-viral
Jul 27 2018 15:08
So I only check when any(target_pointer_width = "16", target_pointer_width = "32")
target_pointer_width = "16" should be valid since it is used in std
Lyle Mantooth
@IslandUsurper
Jul 27 2018 15:09
OK, good to know.
Denis Lisov
@tanriol
Jul 27 2018 20:19
@tsoernes Even in numpy these create partial copies, not views. A view has certain structural requirements that do not hold for an arbitrary filter.
Ichoran
@Ichoran
Jul 27 2018 20:50
@omni-viral - x == ((x as usize) as u64) should do the trick at runtime.
tsoernes
@tsoernes
Jul 27 2018 23:05

Any idea how to create a 2D ndarray directly from a CSV file? This works:

    let mut rdr = csv::Reader::from_path(file_path).unwrap();
    let xx: Array1<Array1<f64>> = rdr.records()
        .map(|row| {
            row.unwrap()
                .into_iter()
                .map(|e| e.parse().unwrap())
                .collect()
        })
        .collect();

But using let xx: Array2<f64> does not

Ichoran
@Ichoran
Jul 27 2018 23:23
I don't know how that would work conceptually. You won't know how to size the 2D array until you've read everything.
Unless you don't even have the memory to store the array twice--at which point I'd question whether you can do much of anything useful with ndarray--the fastest thing will be to build it like that and then repack it into a 2D array.
Because at that point you will know how big it is and allocate the appropriate amount of memory in one go. Bulk memory copies are really, really fast.
tsoernes
@tsoernes
Jul 27 2018 23:32
okay thank you.