These are chat archives for FreeCodeCamp/DataScience

22nd
Dec 2017
Emma
@Simmonds14
Dec 22 2017 00:23
@mcbarlowe It didn’t make sense when I wrote it. What do you use python for with sports data and the outputs that are possible? I’ve been doing some neural networks to find patterns within sports data and some other work with prozone data; as well as simulations.
Eric Leung
@erictleung
Dec 22 2017 00:46
@Simmonds14 what do you mean by "tennis formats"? I used to play tennis so I'm interested :smile: Are you talking about like singles vs doubles? Australian doubles?
Matthew Barlowe
@mcbarlowe
Dec 22 2017 03:26
@Simmonds14 well basically what has been done in NHL and other sports is too look at what values correlate with wins, or in the case of the NHL , goals. They discovered that total shots taken i.e. the team that gets the most shots usually scores more goals. Obviously it's gotten more advance than that, but that was the basis of it. What i would do to start out is just gather as much data as you can and see what stats correlate the best to wins. The thing with hockey there is a lot of inherent variance or luck that I wouldn't think would be present in tennis
I don't know how in depth the data is you have but obviously there would be simple things to look at first like the probability of a win by a player who gets first serve
Yingjie (Iris) Hu
@huyingjie
Dec 22 2017 04:58
@erictleung @mcbarlowe Thank you. I will use R packages to create interactive plots. The packages use some JavaScript libraries. I hope that I can find a place to ask for help if I have questions in the future.
CamperBot
@camperbot
Dec 22 2017 04:58
huyingjie sends brownie points to @erictleung and @mcbarlowe :sparkles: :thumbsup: :sparkles:
:cookie: 142 | @mcbarlowe |http://www.freecodecamp.org/mcbarlowe
:cookie: 562 | @erictleung |http://www.freecodecamp.org/erictleung
evaristoc
@evaristoc
Dec 22 2017 11:28

@mcbarlowe Interesting... I wonder if you shouldn't try a harder model. The correlation you are suggesting is not necessarily so for all the cases but it looks like the most direct correlation. But the fact that it is the better proxy it is not necessarily the reason.

For example, soccer. I was reading some statistics and found out that two well known Spanish teams, Real Madrid and Barcelona, offered different average shots to goal, having the Real Madrid an slightly larger average. But Barcelona is more precise.

But the explanation would start at follows: what makes a team able to shot more?

In soccer, strategy plays a big deal and formations might differ by opponent too. How flexible is the formation then?

Shots, in your case, could be a consequence of the main explaining factors.

It also depends of the kind of game too. If strategy doesn't play a strong role for example, the number of explaining factors could be less. Probably tennis relies more in a few individual components where the game has few variants but it is the outstanding characteristics of the player that makes him/her a winner. Eg. Selena or Federer, to mention some. Selena in particular is known for her devastating service, because her strength. But here strength and precision, as well as winning mentality, could play a more important role than "is_servicing", to give you an example of my point.

So it is more like: "Because she is strong and precise and has a strong winning mentality, she has a devastating service. Players servicing have usually an advantage over those with a weaker service of winning the game."

The approach by @Simmonds14 is interesting. If there is a non-linear relation between factors, NN are likely one model that will help to reveal them, but it won't provide a mathematical model. For that, I think you need to have a look at hierarchical modelling as an option.

I already shared this link and its second part long time ago in this chatroom. For D'Amour, and for many researchers I have heard so far, the challenge is actually finding an analytical modelling over a data-based one.

Why analytical? Think of Physics: it is an explanation of the world. Data-based models are empirical, but not necessarily explicative ones.

evaristoc
@evaristoc
Dec 22 2017 11:34
The model of D'Amour is explaining involves a time variant, and with reason: sports are a dynamic game when the previous state could be relevant to explain the next state.
An state-based approach it is likely the best way to approach a good mathematical or data-based modelling for sure.
evaristoc
@evaristoc
Dec 22 2017 11:39

@mcbarlowe @Simmonds14 probably we could start from a simple case to realise the important of the most immediate factors that might be relevant for a winning scenario.

A simple example of a sport with simple characteristics? Arm Wrestling.

Emma
@Simmonds14
Dec 22 2017 14:34

@erictleung I’m comparing the current grand slam format with the fast4 format that has been introduced at the Next Generation finals (and is at some exhibition events). Both for men and women, to see how changing the format would effect a players chances of winning a point, game, tiebreak/ set and also what would happen in tournaments. It’s done by the equation by Croucher used to work out the probabilities of a player winning a game then simulated into a tournament format. You can then say if a player has this probability of winning the game how would they compare when playing a player of this probability. I must credit my lecturer for his help on this though.

@mcbarlowe I’m assuming then you have data with regards to passing sequences prior to shots and goals to enable you to do that. I have a collection of tennis data but no point by point data, although I know where to get a hold of it. There is definitely variance and luck in tennis, I’m not sure what work has been of the top of my head on, but will definitely start by seeing if anything correlated together.

@evaristoc Thank you for posting that video, I’ve been using tracking data, but that in that regard. You’ve all definitely helped with a few ideas of things that could be done 😃

CamperBot
@camperbot
Dec 22 2017 14:34
simmonds14 sends brownie points to @erictleung and @mcbarlowe and @evaristoc :sparkles: :thumbsup: :sparkles:
:cookie: 143 | @mcbarlowe |http://www.freecodecamp.org/mcbarlowe
:cookie: 388 | @evaristoc |http://www.freecodecamp.org/evaristoc
:cookie: 563 | @erictleung |http://www.freecodecamp.org/erictleung
Yingjie (Iris) Hu
@huyingjie
Dec 22 2017 15:32
I am doing data visualization.
       .attr("x", (d, i) => {
         // Add your code below this line
         i*30

         // Add your code above this line
       })
What is the name for (d, i)=>? I searched “closure” online, but tutorials do not give functions like this style. How can I find a tutorial for it?
evaristoc
@evaristoc
Dec 22 2017 15:41

@Simmonds14 no worries! I hope that helps. You are going through a very interesting topic indeed.

@huyingjie Sorry, Yingjie. Not sure where the confusion is.

If you are confused about how the Javascript function was written, it is ES6. It is the most recent version of JS after ES5 version.

If you are rather confused about inserting a function as argument for the attr method, the way it is inserted here is by using an anonymous function.

I can confirm there is surely a lot of information on the internet about both. Not sure if you are able to find it easily?

Those are topics people are also commenting a lot in the FCC forum and the visualization chatroom.

I think the most convenient tutorial for you right now is about JS-ES6.

Yingjie (Iris) Hu
@huyingjie
Dec 22 2017 15:48
@evaristoc I found the keyword to search related information. I learned tutroial on old website then jumped to data visualization on the beta. Beta taught arrow function but old website did not. That is the reason why I confused. I don’t want to learn all ES6 because it is overwhelming. I like freecodecampe because it will overwhelm learners.
evaristoc
@evaristoc
Dec 22 2017 15:50
@huyingjie Great! Good luck!! The beta, personally I think it is an evolution over the previous version. I really recommend it. @erictleung was very much involved by the way.
Matthew Barlowe
@mcbarlowe
Dec 22 2017 16:50
@evaristoc there are more complex models but you are limited by the data you have and soccer and hockey are completely different in regards to shots and how one should weight then
Plus more complex isn always better especially when you don’t even know what the features you are looking for should be
Eric Leung
@erictleung
Dec 22 2017 22:18
@Simmonds14 oh I see. I've never heard of the new fast4 format before. I guess I've been out of the loop for a while now haha. Sounds like a reasonable data driven project. I'll be interested in hearing the results. I'm sure pro tennis athletes will probably use similar analyses to predict their performance.
Speaking of tennis, just read this article on how to become a good data scientist based on how Nadal became a great tennis player: “Become the Rafael Nadal of Machine Learning” https://medium.freecodecamp.org/baby-steps-to-learn-machine-learning-from-a-tennis-fan-d4171f51c23f
I like the connection between the two of trying to be good at something and I agree with most of the parallels between them.
evaristoc
@evaristoc
Dec 22 2017 22:53

Hmmm... studying the d3 force layout module I ended at this page https://github.com/d3/d3-force, which was talking about Vertel velocity, which took me to a particular case of numerical solutions based on symplectic space / geometry (if I am not wrong and very simply put, those where a variant is preserved, eg. energy, https://www.youtube.com/watch?v=QyNAiEZhBW8). I had to revise the concept of Hilbert spaces (If I am not wrong and very simply put, those where the inner product is defined, https://www.youtube.com/watch?v=jWkzBaJDSmY).

Don't think I am mathematician. I just simply like it.

Very interesting that Mike Bostock and co. were using an advanced numerical method for solving the force layout (I didn't know until today...). Wondering which language was used for that. Javascript?

Anyway... going back to the d3 layout stuff.


@mcbarlowe regarding the comments about sport analytics: absolutely good point, sorry for not taking that in consideration.
evaristoc
@evaristoc
Dec 22 2017 23:18

It seems that the force layout of d3.js version 4 is rather more advanced than version 3? Really cool!!!!

I think Mike said once he wanted to break d3.js into modules. I think I see now why: apart of modularity, it seems that the plan is to develop super sub-libraries that could evolve independently of the whole, maybe?

The new geo library of version 4 for examples looks amazing...
evaristoc
@evaristoc
Dec 22 2017 23:30

People:

Just felt by accident on this link while quickly glancing at the Mike Bostock's Twitter. It reminds me a talk I went to last year. But the following was made more for fun / divulgative purposes than for analytical purposes:

What the Neural Nets see:

https://distill.pub/2017/feature-visualization/