Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Frequentist Statistics:

1. Start with a model that you think describes your data.

2. Find the parameters of the model that make it fit the observed data most closely.

3. If the model really does not seem to fit the parameters, reject the model. The criteria for it not to fit is usually the p-value, which is the probability of data at least as unlikely as your observations occurring if your model is actually correct. If the p-value is less than .05 (by an arbitrary and accidental custom) the model is rejected. Wrap your head around that, if you dare.

4. If the model is not rejected, use it to predict future outcomes with the best-fit assumptions, ideally allowing for some uncertainty in the values of the parameters. Decisions are made on this basis. The uncertainty in the values of the parameters is quite difficult to allow for in practice.

Bayesian Statistics.

1. Start with a model that you think describes the data.

2. Then get some models to model the parameters of your model, called prior models, describing your uncertainty about the parameters of your model. You get the initial parameters for the prior models from your own head, or the head of an field expert.

3. Use the observed data to refine your estimates of the parameters for the prior models. So you started with educated guesses of these 'prior' parameters, and then you made the guesses better using your observations. In practice, your guesses can become irrelevant very quickly as you add more data.

4. Predict future outcomes using your model, where the uncertainty in the model parameters is modelled explicitly using your prior models as described in 3.

5. Put those predictions of future outcomes into a utility function to make decisions.

Summary: The main thing is that Bayesian statistics allows you to specify models for your parameter uncertainty, provided you are okay with the educated guesses.



Because hessenwolf and I are philosophically opposed here, I'll give my take. Balance your impressions between us as you choose.

---

Frequentist statistics attempts to answer the question "How will my experimentation appear knowing that there is some hidden, unknown truth to the world which generates it?" The methods then proceed to use a variety of clever arguments to show that seeing a certain experimental result (considering all possible experimental results) constrains the possible underlying reality and gives you a good guess at to what it is (and allows you to estimate how much it might vary).

Bayesian statistics asks the very different question "How does this observation I'm making affect my current knowledge of the world?" It is pretty difficult to look at the methods without seeing an interpretive nod toward the process of learning. To do this update step, Bayesians consider the relative likelihood of all possible underlying realities given that they've seen said experiment.

It's not clear to me that these two methods are at all asking the same question. In particular, they each consider (marginalize, integrate) vastly different properties and their results have different interpretations. However, since both of them fit into the space of quantifying the effect of observation on the parameters of a model of the world they end up in constant conflict.

Moreover, it's easy to construct Bayesian arguments which correspond to exactly the same algorithms as some Frequentist arguments. Bayesians argue then that their path to reach that algorithm is more interpretable and clear, especially to non-mathematician. This method collision serves to further conflate the two methods as enemies.

---

tl;dr: Bayesian statistics is an average over possible realities, Frequent averages over possible experimental outcomes. It's not clear that these are comparable at all, but since they often try to answer the same questions we compare them anyway.


Nope. I agree with what you are saying; and would like my research presented as summary statistics under different models (of which the p-value is one) for sciency stuff, and expected utility values under different utility functions and priors for decision making. I think the priors actually really become a moot point after you tack on the utility function.


I think that choice of model (even nonparametric or empirical distributions) and choice of priors are linked. Both are assumptions based on prior knowledge and analytical approach. Both are overwhelmed by the data in a fertile experiment.

Utility functions are a different beast though. They don't have an update procedure and can wildly affect your decision. I'm also convinced they're the best tool we've got so far, so I take it as an illustration that making informed decisions is just hard.

Presentation of summary statistics is fine. I prefer presentation of full, untransformed, unpruned data as well when feasible. It's, of course, often not feasible. I also demand justification for why you think those summary statistics are meaningful and under what kinds of situations they would fail to capture the conclusion presented. Not saying that this isn't done in a frequentist setting, but I think it's harder.


You demand, eh?

Honestly, really, truly, honestly, i never bothered with p values as a statistician except for two cases. The first is when performing a test for somebody else to go into a standard article format. The second is when automating reports on complex data.

P values are for people who you do not trust to make decisions. Graphs and arrays of summary statistics fro. Several differentodels are for statisticians.

Also, i disagree that model choice will be overwhelmed by the data.


Hah, I should really write these with more care. I'd feel entitled to demand, but more meekly expect that there's a bit more trust and convention in scientific publication. Though that can be taken too far.

You're right that model choice can still break your analysis given large amounts of data. I was thinking more in terms of a whole inquiry where large amounts of data will help you to locate a model that extracts the maximal information from your observations. If we're able to keep experimenting forever, we pretty much assume we'll eventually get highly accurate maps of the world.

The primary difference was in utility functions where no matter how long you experiment they remain exogenous and static.


> I think the priors actually really become a moot point after you tack on the utility function.

Why? It seems that both are essential?


Priors aren't essential in some models when you're looking for an unbiased estimator and you have a complete, sufficient statistic. Please don't ask me to tell you when that will happen.

Utility functions are necessary if you want to make a decision based off your knowledge. If your goal is simply to state "given model M, parameter A most likely takes this value based on experimental data" then you don't need a utility function.

I think hessenwolf's point is that priors and utility functions are both largely unconstrained functions over the state space of parameters that need to be specified based on the experimenter/reviewer/reader's beliefs and values (respectively). Formulating them and making everybody happy is still an open research topic.


Yes, that was my point. :-))


You seem to be conflating quite a few axes of variation. What you're describing as "frequentist" is descriptive statistics using parametric models, which is only one possible way of using a frequentist interpretation of probability.

Nonparametric statistics is probably the biggest active area of frequentist-statistics research, in which case fitting parameters of models isn't exactly what you're doing (though there is some sort of model-building process, and some arguing about what constitutes a parameter).

In addition, predictive statistics is a quite large area of frequentist statistics, and it does precisely what you call "Bayesian" steps #4 and #5, except within a frequentist framework: you fit a predictive model, which may include uncertainty estimates in its predictions, and then feed that model's output through something decision-theoretic, like a risk function.

Bayesian decision theory and frequentist decision theory do look different, but it's not as if frequentists don't have a decision theory (and a real one, not just "use the model if it has a low enough p-value")...


Conflating - yes, hell-yes, I am simplifying as much as possible. I do use steps #4 and #5 within the predictive model including uncertainty estimates. My step frequentist 4 refers to this. The steps aren't completely aligned.

My comment is a description of the difference as it affects me, and not necessarily capturing all of the effects on others. Please do expand...


I suppose to me it's more of a difference of decision-theoretic approaches, which come up with decision rules that make "best" decisions given the data, under certain definitions of "best", versus descriptive-statistics approaches, which aim to summarize the data, test hypotheses, report significant correlations, etc. I can buy many of the arguments for decision-theoretic approaches (especially if you are in fact making decisions), but that doesn't necessarily tell me why I should use a specifically Bayesian decision-theoretic approach.


> If the p-value is less than .05

You meant greater than .05


If the p-value is less than 0.05 you reject the null hypothesis, because it means there was very little chance of observing the data under the model. Then you end up with your alternative hypothesis, usually implying you can fit an extra parameter and go play.

For the sake of explanation, I have made as if one goes to fit the null hypothesis and can fail. Mine is certainly not a fine example of explication, but it entertained me writing it.


Hold on a second, you originally wrote, "if the p-value is less than .05 (by an arbitrary and accidental custom) the model is rejected." Usually by model we mean an alternative to null hypothesis. So essentially you said, if p < .05, alternative model is rejected and null hypothesis is accepted. Well that's a contradiction to what you just stated in your second post (and to what we both agree on).


Nah, you fit the null hypothesis, and if it fails you reject it. You never, ever accept the null hypothesis; there might just not be enough power in the test.


My goof, of course you don't accept the null model (you generally never accept any models -- you only eliminate ones that are worse at explaining the data).

In basic frequentist stats, you usually have two models -- a simpler one (usually called the null hypothesis), and a more complex one called an alternative model (what makes it more complex is usually one or more extra parameters), and you're usually interested in testing whether the more complex model holds up when compared to the more simple one. You do this most often by a likelihood ratio test: you divide the probability of data given null hypothesis by the probability of data given alternative model, and then you compare the value of the negative log of the resulting ratio to an expected distribution of said statistic assuming the alternative model is false and taking into account degrees of freedom (how many more parameters the alternative model contains). If it turns out that, under the null hypothesis, the probability of the ratio statistic being larger or equal to the one at hand is <= 0.05, the null hypothesis is rejected. The alternative hypothesis is not automatically accepted yet but it is said to explain the data better than the null model.

Basically everything you wrote is correct, it's just that I misinterpreted what you referred to as "model" to mean "alternative model," while you actually meant "null hypothesis" or "null model." Now you should have been more clear on that, otherwise you can confuse people new to this subject (you already confused me!)


Ha ha. I used to repeat the p value interpretation over and over in different ways to the class in the hopes of eventually explaining what it is not. I did dare you to wrap your head around it, so you were warned!! It is an upside down concept.

You know the value is 0.05 because fisher had a tabulation of the zeta function lying around with 0.05, 0.025, and 0.01 in it when he was writing the paper?


Ha, I knew 0.05 was just convention and nothing special, but didn't know it originated randomly like that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: