(This question relates to a problem I had at work a while ago, doing a little data mining at a car rental company. Names changed, of course.)
There was a flight of steps out the front of our building. It had a dodgy step on it, on which people often stub their toes.
I had records for everyone who works in the building, detailing how many times they climbed these steps and how many of these times they stubbed their toes on the dodgy step. There's a total of 3000 stair-climbing incidents and 1000 toe-stubbing incidents.
Jack climbed the steps 15 times and stubbed his toes 7 times, which is 2 more than you'd expect. What's the probability that this is just random, vs the probability that Joe is actually clumsy?
I'm pretty sure from half-remembered statistics 1 that its something to do with chi-squared, but beats me where to go from there.
...
Of course, we actually had several flights of steps, each with different rates of toe stubbing and instep bashing. How would I combine the stats from those to get a more accurate better likelihood of Joe being clumsy? We can assume that there's no systematic bias in respect of more clumsy people being inclined to use certain flights of steps.