The Strange Case of the Three Wheelie Bins and Elephants in my Garden
Drawing invalid conclusions when (horse racing) data is analysed.
Post Comment|0 Liked It
In my garden I have three wheelie bins. Two are green and one is black. Also, I have never seen an elephant in my garden. I could conclude that the unique combination of one black and two green wheelie bins is sufficient to deter elephants from entering my garden. I could conclude that, but it would be incorrect, not to mention a nonsense, given that I live in Cambridgeshire. Even had I lived out in the wilds of Africa, it would still be a nonsense. And yet, such conclusions are frequently drawn when horse racing data is analysed.
Some of my time in the IT industry was spent testing computer systems. One thing that I quickly learned was that ‘people see what they expect to see. They don’t necessarily see what they don’t expect to see’. As a result, even though a system may give an incorrect answer, people often, and surprisingly, observe a system as providing the correct answer. In order to prevent this from happening, I instigated a procedure whereby, before a system was tested, a set of expected results were created for a given input and the actual and expected results compared. In this way, errors were more likely to be detected. People seeing what they expect to see is also the reason why so many motor cyclists end up in hospital, or worse. When a car driver approaches a road junction and stops, the driver looks right, left and right again, or, at least, is supposed to do. Does the driver look for motor cyclists? No, the driver looks for other cars. The driver turns left, or right, and collides with a motor cyclist. But, why didn’t he see the motor cyclist before he turned left or right? Because he wasn’t looking for motor cyclists, he was looking for other cars. He didn’t see any other cars and so proceeded – straight into the path of a motor cyclist.
Moving on.
I have noted that, in the main, when people analyse horse racing data, they do so expecting a certain outcome. This approach, in my opinion, is flawed because:
·Only that data which confirms the expected outcome is considered.
·Data which refutes the expected outcome is often missed, dismissed or simply ignored.
·Conclusions which are un-related to the expected outcome are often missed, dismissed, ignored or never even investigated.
In my opinion, the better way is to collect as much data with as many variables as is possible. When the data is analysed, it is important to keep an open mind, not to expect a certain outcome and to analyse the data in as many different ways as is possible prior to arriving at any (preliminary) conclusions. Once the preliminary conclusions have been reached, the data should then be re-analysed in order to determine if evidence exists which refutes the preliminary conclusions. If no evidence is found which refutes the preliminary conclusions, then the preliminary conclusions can be accepted.
By way of example, approximately two years ago, I came across someone who was testing a backing system on a horse racing forum. Although the details of the system were not revealed, I investigated the system and found that it was based upon a particular jockey/trainer combination. Although the system was initially successful, it began to slowly deteriorate. I contacted the system’s creator, via a private email. In the email, I stated that I had identified his system selection criteria. I also asked the following questions:
- In terms of strike rates, how successful is the jockey when riding for other trainers ?
- In terms of strike rates, how successful is the trainer when other jockeys rode for him ?
- In terms of strike rates, are there certain types of races at which the trainer/ jockey combination is particularly successful?
- In terms of strike rates, are there certain ages of horses at which the trainer/ jockey combination is particularly successful?
- In terms of strike rates, are there certain tracks at which the trainer/jockey combination is particularly successful?
- In terms of strike rates, are there certain surfaces (turf/All-Weather) at which the trainer/jockey combination is particularly successful?
I could go on, but I think that I will stop there.
The system’s creator responded to my email and stated that he had only investigated one particular jockey/trainer combination after he noticed that they had a particularly good recent strike rate. He had not broken down the data any further.
The system’s creator eventually ceased posting following a particularly long losing run during which all of the past profits were lost and a lot more besides.
Had the system’s creator investigated a little further, he would have found that the jockey had a particularly good, long-term strike rate on All-Weather tracks with a different trainer. In addition, the trainer had a higher strike rate with certain other jockeys.
The above is an excellent example of what happens when data is analysed and the outcome is pre-determined and expected. It shows that analysing data in this context can be dangerous since it can lead to conclusions which may prove to be erroneous, long-term. The system’s creator expected to find and found that, indeed, the jockey/trainer combination was profitable, whereas, in fact, the combination was only profitable over a past, short-term period. The system’s creator also failed to identify the fact that the jockey in question had a higher strike rate with certain other trainers and an unusually high, long-term strike rate with one trainer in particular – but only on All-Weather tracks. In addition, the system creator failed to notice that the trainer had a better strike rate with certain other jockeys. The latter facts were missed because the system’s creator concentrated only on one particular jockey/trainer combination.
Psycho
www.laythepsychicway.com
www.psycholaysdogs.com
www.psychosshortlays.com

