What is it all about?
Anyone who has ever watched a sports competition is familiar with expressions like “on fire”, “in the zone”, “on a roll”, “momentum” and so on. But what do these expressions really mean? In 1985 when Thomas Gilovich, Robert Vallone and Amos Tversky studied this phenomenon for the first time, they defined it as: “…these phrases express a belief that the performance of a player during a particular period is significantly better than expected on the basis of the player’s overall record”. Their conclusion was that what people tend to perceive as a “hot hand” is essentially a cognitive illusion caused by a misperception of random sequences. Until recently there was little, if any, evidence to rule out their conclusion. Increased computing power and new data availability from various sports now provide surprising evidence of this phenomenon, thus reigniting the debate.
To understand what “expected” means, let us restrict the current discussion to results that can be defined in a binary sense, namely successes and failures. It means that in each trial the result is drawn randomly with some probability of success. In this framework, the words “expected on the basis of the player’s overall record” means that the probability of success in each trial is independent of previous results and constant throughout time.
Gilovich, Vallone and Tversky argued that time series results from basketball are indistinguishable from repeated uneven coin tosses (the coin might have a probability of success which is different than 50%). Despite being extremely influential in the scientific community, their conclusions were highly controversial, as the vast majority of sports fans remained confident that sometimes players are indeed “on fire”. Amos Tversky described the situation saying: “I’ve been in a thousand arguments over this topic, won them all, but convinced no one.” Stephen Jay Gould wrote “Everybody knows about hot hands. The only problem is that no such phenomenon exists.”
Could it be the case that fans were right after all? The answer is a little complicated and depends on the specific task, but data seem to suggest that hot hand does exist after all.
When studying this phenomenon one major complication factor is the presence of an opponent. The success probability of the task is no more dependent on the skills of the player only, but also confounded by the performance and strategy of the opposing players. A player that “gets in the zone” is likely to change the defensive strategy of the opposing team, making it more difficult for him to perform. Moreover, different opponents have different skills leading to tasks of varying degrees of difficulty. All these factors make testing the existence of hot hand very difficult, and require more complex models. These confounding factors can be overcome by considering tasks with minimal external interferences.
So how can one distinguish between a “pure random series” (essentially repeated coin tosses) and something else? This is where statistics comes in handy. Without delving deep into the technicalities of the different statistical tests, we would like to just make note of a crucial point: the fact that a statistical test does not detect a phenomenon does not mean that this phenomenon does not exist. Most statistical tests are meant to reject a null hypothesis – the fact that it cannot be rejected does not mean that the null hypothesis is correct. It might be the case that the statistical test used is not sensitive enough for the type of data and phenomenon being tested. It can also be the case that data is insufficient to yield a definite answer. In the case of the hot hand phenomenon, it turns out that both apply: the tests used by many of the papers studied this phenomenon weren’t adequate and the data was not sufficient in many cases (see a wonderful presentation by Nobel Laureate Brian Josephson about this type of reoccurred error, and also the following two papers about the inadequate tests used to detect the hot hand phenomenon).
So what’s new?
Until recently, there was practically no evidence for the presence of the “hot hand” phenomenon in sports (see review). However, lately as data mining and statistical methods improved dramatically, the “hot hand” phenomenon has received support in various domains. Some examples are (see also this book and website):
The existence of hot hand means that you cannot model a series of an athlete’s performance with repeated coin tosses. The observed fluctuations between good and bad periods are, larger than expected, by a pure independent random process.
Interestingly, another contradicting example was shown in basketball 3 points attempts, where it was shown that data actually present an “anti-hot hand”. But as mentioned above, in this framework the defensive strategy is important and is likely to influence the performance of the player – a player who has a “hot hand” will attract more attention from the defense – which can directly influence the results of future trials.
These examples basically show correlation between current results and previous ones, so an athlete’s performance is not just repeated coin tosses. Does this mean that “success breeds success” and “failure breeds failure”, or is there something else at hand?
Correlation vs. Causations
Most people will agree that there is a significant correlation between weather condition and the number of people carrying opened umbrellas – the number increases significantly on rainy days. Does this mean that opened umbrellas are causing rain? Did we just find the solution for drought?
Correlation and causation are often mixed together. From a statistical point of view, this is a difficult question. Human minds are often after “reasoning” and tend to misinterpret correlation as causation. To prove that something is actually causing something else, one has to perform more detailed studies and not rely on statistical correlation only. Correlation is essential for causation but not sufficient.
Despite the above definition, many researchers refer to the “hot hand” phenomenon as some kind of psychological “feedback” mechanism, which changes athletes’ performance due to their recent past results (causality). What we and other researchers observed lately is a correlation between current results and previous ones – but does this mean that previous results influence players’ performance in their next attempts (causation) ?
In a paper published in PLoS ONE , we (Gur Yaari and Gil David) present our analysis of the “hot hand” phenomenon in bowling data. We studied almost 50,000 bowling games, extracted from the Professional Bowlers Association (PBA) website. Each game was represented as a frame-by-frame series of zeros and ones. If the bowler got a strike in a frame, this frame was considered as a success (1), otherwise as a failure (0).
We were able to supply evidence that shows that players exhibit “good” games and “bad” games, which could not be explained solely by pure luck: it means that the series could not be modeled as repeated coin tosses. This observation verifies the existence of the “hot hand” in bowling, similar to previous studies in this domain. In addition, our new observation shows that within each game, successes and failures (i.e. strikes and non-strikes) are not grouped together in continuous series – on the contrary, they are spread randomly inside each game. Thus, we show that the result of one frame does not influence the result of the next frame in a causal manner – if a player had a success in the 4th frame, this by itself does not affect players performance in the 5th frame.
On the other hand, we also showed that it is possible to use the first observation of “good” and “bad” games (correlation) to improve the prediction for the results of the last frames in a game, based on the whole series of the preceding frames. In other words, the fact that a player had good results during the first 8 frames indicates that this is a “good” game and thus his/her probability of rolling a strike in the remaining frames will be higher.
An analogy that may help to understand these results is to imagine two coins:
The first coin is a fair coin with 50% chances of presenting a head and 50% chances of presenting a tail in each toss.
The second coin results in heads (with probability of 99%) for 50 tosses (one day) and then results in tails (with probability of 99%) for the next 50 tosses and so on (think of it as someone who alternates between extremely good and extremely bad days).
If now, you were given results of two consecutive tosses of both coins: for the first coin, the result that you observe on the first toss doesn’t change the fact that the result of the second toss will have a probability of 50% to show head.
On the other hand, in the case of the second coin (the one with “hot hand”): if the first toss was a head – it means that most likely it has a good day. Hence, the probability that the second toss will be a head is ~98 %. If you observe a tail on the first toss, it means that most likely it is a bad day and the probability that the second toss will be a head is only ~2%.
As you can see, both coins have a 50% probability of landing heads on the long run, and both coins have good days and bad days. However, only for the second coin one can identify a “hot hand” due to the magnitude of the fluctuations between good and bad days (good days are really good and bad days are really bad!).
Moreover, the second coin results are not due to causations – i.e. the result is a head (for example) because most likely it experienced a good day and not because the preceding toss was a head.
Our research, and the others mentioned here, demonstrate that players’ performances are not affected by the results of previous trials, but rather by other factors which cause the resulting time series to be more complex than a simple series of coin tosses. Maybe for some of you this sounds trivial, but it was not the consensus in the scientific community until recently. The results mentioned here may open the door for future studies to address a more important question: what really cause athletes to perform better and how could they use this kind of knowledge to improve their future performance.