These are chat archives for rust-lang/rust

18th
Jan 2017
Robyn Speer
@rspeer
Jan 18 2017 07:45
I'm trying to understand how the 'unicode_segmentation' crate works, and I don't understand its use of &&str. For example:
pub fn new_unicode_words<'b>(s: &'b str) -> UnicodeWords<'b> {
    use super::UnicodeSegmentation;
    use tables::util::is_alphanumeric;

    fn has_alphanumeric(s: &&str) -> bool { s.chars().any(|c| is_alphanumeric(c)) }
    let has_alphanumeric: fn(&&str) -> bool = has_alphanumeric; // coerce to fn pointer

    UnicodeWords { inner: s.split_word_bounds().filter(has_alphanumeric) }
}
Why is s of type &&str here, not just &str?
(in the inner function has_alphanumeric)
Aleksey Kladov
@matklad
Jan 18 2017 07:50

@rspeer filter is a method of iterator, and it passes references to the closure. Because the original is the iterator of string slices (&str), the second ampersand appears.

This could have been written as

UnicodeWords { inner: s.split_word_bounds().filter( |&s| has_alphanumeric(s)) }
if you would like to pass a single reference to has_alphanumeric.
Robyn Speer
@rspeer
Jan 18 2017 07:52
Hmm. Maybe I should look up some more examples of filter to understand.
Aleksey Kladov
@matklad
Jan 18 2017 07:53
So, for example [1, 2, 3].iter().filter(|x| true). The type of x inside the lambda would be &i32.
This is also a bit superficial in the same way as &&str is. It would be more natural to have x of type just i32.
This can be achieved if you write a pattern in the lamda argument |&x| true.
Perhaps the type signatures would help? The original lambda looks like this |x: &i32| -> bool { true }
The updated like this: |&x: &i32| -> bool { true }. The & after : is a part of the type. The one before x is a part of a pattern.
Robyn Speer
@rspeer
Jan 18 2017 07:56
Looking at the Rust book's explanation for why .filter takes in references.
okay, it has to do with lifetimes.
so, sure, I can recognize that if a filter function on i32s takes a &i32 argument, a filter function on &strs takes a &&str argument.
Is the version with |&s| preferred over the function that takes in &&str? Is it slower?
Aleksey Kladov
@matklad
Jan 18 2017 07:59
Not exactly I think. filter can take values, if you iterate by values.
let xs = vec![...];
// if we use `.iter`, filter takes references, but we can iterate twice
xs.iter().filter(|x: &i32| ...)
xs.iter().filter(|x: &i32| ...)
// if we use `.into_iter`, filter takes values, but `xs` is gone afterwards.
xs.into_iter().filter(|x: i32| ...)
Robyn Speer
@rspeer
Jan 18 2017 08:00
I assume that, for that to work, the result of into_iter() is a different type than the result of iter(), so its filter method can have different effects
Aleksey Kladov
@matklad
Jan 18 2017 08:00
Yes, you are write!
.iter() gives you Iterator<Item=&i32>, and iter_mut() is Iterator<Item=i32>.
Robyn Speer
@rspeer
Jan 18 2017 08:02
Hm. The book is explaining this as a difference between map and filter, though, not a difference between .iter() (or iterating without calling .iter()) and .into_iter().
Aleksey Kladov
@matklad
Jan 18 2017 08:02

Is the version with |&s| preferred over the function that takes in &&str? Is it slower?

Ah, actually the reason why fn is used over lambda is rather complicated. If you use lambda, you wouldn't be able to return the iterator without boxing (this limitation would be lifted once impl Trait is stable).

Hm. The book is explaining this as a difference between map and filter, though.

Hm, I may be wrong, I have not double checked this :) It may be the case that filter always takes references, and iter vs into_iter is an orthogonal issue :sweat_smile:

Robyn Speer
@rspeer
Jan 18 2017 08:04
For example, (1..100).map(|x| x + 1) and (1..100).filter(|&x| x % 2 == 0) are two examples in https://doc.rust-lang.org/book/iterators.html
Aleksey Kladov
@matklad
Jan 18 2017 08:05
Yeah, filter must always take references, otherwise the element would be gone after filtering :fearful: Sorry for causing confusion :)
Robyn Speer
@rspeer
Jan 18 2017 08:05
Okay
I don't know how that would interact with into_iter, but I'm not going to need that for a while, so it's all good
Thanks!