What are people in this thread’s thoughts on p-hacking and the replication crisis in the social sciences? The evidence being presented here seems very similar to p-hacked evidence for unreplicable findings, and the way to reduce p-hacking (pre-registering hypotheses) is pretty much what I’m asking for when I ask for a concrete, public, and risky prediction.
The p-hacking problem as I understand it: p-values are basically a measure of how unlikely an outcome is. If two measured variables are unrelated but noisy, there’s some chance that they’ll look related due to random fluctuations. The more measurements you make, the less likely that is. For a result to be statistically significant, it has to be very unlikely that the result is due to noise. That requirement is expressed in terms of ‘p’.
For example, if we want to relate a specific demographic (D) to a target concept (T), we can survey a bunch of people and ask about their demographic information and about the target concept. We then look at all the surveys and see how they’re related. And we can say in advance that there’s a certain likelihood, (L), that, given the number of surveys we have, we’ll get a false positive – that through random noise we’ll find that (D) is related to (T).
The problem arises when many data points are measured, such as on a survey that asks for lots of demographic information and then lots of questions about a target concept. What happens is that we have a bunch of different relationships, between (D_{1}), (D_{2}), (D_{3}) etc. and (T_{1}), (T_{3}), (T_{3}), etc. Lets say that for any (D_{i}) and (T_{j}), there’s a likelihood of (L) that we would get a false positive. What was happening is that researchers took the data, looked at every relationship, found one that seemed to be related with a very low (L), and said, “look, (D_{i}) and (T_{j}) are related, and we’re super sure about it!”
But that math doesn’t actually work out. Let’s say there’s a 1 in a 1000 chance that we’ll get a false positive for any (D_{i}) and (T_{j}). If our data set allows us to make 100 (D_{i}), (T_{j}) pairs, then the chance that we’ll get a hit becomes 1 in 10, not 1 in 1000. If we treat any relationship we find as though we had only asked about that specific (D_{i}) and (T_{j}), we’ll be misled as to how strong the connection really is.
Does that makes sense? Am I incorrect in my description of that problem? Do you agree that it’s a problem? Do you see the parallels between that and the experiences described in this thread?
I am curious about your notion of what it means to make a truth claim. For me, a statement can only be said to be true if its falsity could be demonstrated. For example, my claim, “none of you are psychic”, can be falsified by a concrete, public, risky prediction that comes true. What can falsify your claims here? Or do you reject falsifiability as a necessary criteria for meaningful truth claims?