Meaningful criteria for the approximate probability distributions
- The Expected Value (approximated by the Mean or Average) of the given distribution should reflect the result of the aggregate of all the official polls conducted to date. Note: this criterion is only justifiable if the polls can be considered as truly representative of the actual full voting population. This is, of course, a highly debatable assertion, one extreme view being that "opinion polls are meaningless". But for the sake of current argument, the assertion that the polls are indeed representative, will be considered as valid.
- The integral of the distribution (or "area under the probability density function" or "cumulative density function (cdf)") evaluated over the range from 0.5 to 1 (50% to 100%) should correspond to the implied probabilities ("chance of YES winning", and "chance of NO winning", respectively) from the betting exchange data. It could be viewed that this is a considerably less justifiable criterion than the previous, but on the basis that "the bookies are seldom wrong", it seems like quite a reasonable assertion. After all, the betting exchange data is "crowd-sourced" in that it reflects the combined "beliefs" of many thousands of punters, i.e., potential voters. Also, there is no other readily available source of such information, with the exception of the polling data itself (utilised in the previous criterion). However, modelling probability distributions on just the polling data would lead to much narrower distributions (i.e., suggesting considerably less uncertainty) than implied from the betting exchange data. Or put another way, the betting exchange data in effect incorporates the (potentially real) broader uncertainties than captured in the polling data alone.