This seems to boil down to a criticism of the staging area, so it is strange its purpose is never clearly explained in the paper. The reason why the staging area exists, is, I think, that for larger teams working long term on a code-base, it is crucially important for the version control history to be very, very neat, with logically separate changes in separate commits, with very clean commit messages for each change, with the right changes going to the right branch (eg. you don't want to commit a change that can go live immediately to a branch that's going to be released in 3 months) and so on. That's also why git makes rebase such a big deal, I guess Linus spends a lot of time getting people to use the VCS right even after the changes themselves are more or less right, and thanks to rebase and the distributed model he at all is able to do corrections related to version control and branch/release management before changes enter the main repository.
People aren't really good in remembering about those things up front, that's why Git introduces the staging area, so that you can work as usual and only after you are finished with whatever was occupying your mind you can consider splitting your work into nice commits, which can be quite a task in itself to do right. If you remove the staging area, and want to incrementally build up a few commits from the changes you made, you end up having to pass in again and again a lot of parameters to the commands executed, first to git diff, then to git commit, and it's easy to do a mistake and diff something other than the changes that will actually be comitted in the end.
A lot of people, especially in small teams, use version control very sloppily, and then get confused about conflicts, changes are hard to track down in history etc. Remember that Git was build for maintaining Linux, which has an absolutely huge number of people working in parallel - in this case you really, really have to care about using the VCS tidily, actually understanding its concepts very well, and not just churn away commits, or you will just fail to integrate the changes correctly. So, frankly, while I like the general discussion in this paper and its approach, it seems to me a bit confused with respect to Git, I wonder whether the authors have any experience in doing long-term software development in a team and especially doing software integration and in using a VCS for that purpose. Once you have a few people, or more than one team working on a project, a few testing servers, and a few different release branches, the concepts in Git do make a lot of sense.
Staging seems like a perilous way of creating clean commits. If you make a commit out of only some of the changes in your working tree, the result of a commit will be a filesystem state that never existed in your working directory, and thus was very likely not tested as committed.
I work on a huge codebase using Git together with thousands of other developers. The staging area is a very welcome tool, because often I have to do little changes that I do not want to commit, most often working around some little mistakes by another developer on the other side of the world.
Fixing a small problem in a Makefile, adding a few debug prints somewhere, disabling an unrelated failing assert, and little unrelated fixes like that. It does not make sense to fix and commit them because that would be a distraction and duplicate work because someone is most likely working on it already. But I do need to make my builds pass and my tests run to get on with my work, using little fixes I do not want to include in my commit.
The staging are may be confusing to newbies but it is very useful in large projects with a big team.
I'm not an expert Git user, but I'm impressed by gitless after reading this document.
The use case you refer to is one I frequently have. But I don't think staging is the right solution for that. For one it's limite to file granularity, for two your local changes can be lost by a reset.
So it seems preferable if we could store our local changes as commits in a branch. In this way we can easily identify these local changes and add new one.
When we commit we should be able to specify if the commit is just a local change to fix something for a test or a change to be published. I think git has the tools to do it with cherry pick, but maybe there is a simpler way.
Ah, you've made one conceptual error here that explains everything, and it's hiding right at the end: "tested as committed" implies that committing is some big special thing you do at the end after all your work is finished. You probably picked this idea up from cvs.
This is not how git is intended to be used and you'll hit a lot of friction if you try to work this way. You should be committing before testing - commit early, commit often, commit everything. git has the idea of easy, painless, zero-risk commits built in from the ground up. You never inconvenience or screw yourself over by committing. You can always edit a commit.
The workflow you are expected to use with git is: code, commit, code, commit, edit commits into a presentable form, test, edit commits more to fix issues, push, ask people to review and pull.
You can fix your conceptual model by replacing every instance of your notion of "commit" with "push". Pushing is the operation that you are expecting commit to be.
This doesn't really have anything to do with my point. Even if you later edit commits before pushing them, it is still the case that any partially-staged commit that you do ultimately push will reflect a filesystem state that never existed in your working directory (unless you manually checkout the intermediate commit later as a "detached head" and test that).
Maybe some people don't care that every public commit can actually build and run tests, but any broken intermediate tree will break git-bisect or similar tools.
As a matter of fact, manually checking out all the intermediate commits and building/testing those is precisely what our team does.
Except, it's not done manually, we have a set of tools that walks along a branch and tries to build each revision, storing the build/test status using the commit's SHA-1 hash as a key so that we don't waste effort rebuilding things unnecessarily. Then, this is used for our code reviews to verify that each commit on the branch to be integrated has a clean build/test status saved -- or an explanation why not.
We've given some thought into writing a version of git bisect that takes this cached data as input to select better trees to try given some set of known broken commits, but that hasn't happened quite yet.
In theory that would seem like a problem, but I've never run into that problem in practice. I think this is for two reasons:
1. It actually makes for _cleaner_ commits, because you can group changes in whatever order/grouping you see fit. So, if you need to group part of file A, all of file C, and part of file D into a commit, you can.
2. Committing changes and publishing those changes are two separate actions. So, you are free to group your work in whatever way you think makes sense, and then publish all those changes collectively. So, if you have 16 commits, you can push all 16 of those commits together. This reduces the odds a fellow committer gets a partial changeset.
Of course, git is a powerful tool, and it will happily let you commit and push a partially finished changeset. Great power, great responsibility, yada yada yada. See http://git-scm.com/book/ch6-4.html.
> So, if you have 16 commits, you can push all 16 of those commits together. This reduces the odds a fellow committer gets a partial changeset.
Your argument here is that it doesn't matter if intermediate trees are broken, because no one will actually use the intermediate trees. But this approach will break anything that does use every intermediate tree, like an automated testing suite or git-bisect.
EDIT: this is a bad downvote. I have accurately paraphrased the parent's point, and made a valid counterpoint. If you're downvoting me, you probably didn't understand what I am saying.
The idea is that you do not give intermediate broken commits to things that you do not want to have broken commits. That is why "rewriting history" is so great. Commit early and often, then publish only working code. Have your cake, and eat it too.
> The idea is that you do not give intermediate broken commits to things that you do not want to have broken commits.
But how do you know if the intermediate commits (aka the ones you want to actually publish) are broken or not? If you partially staged them, you don't know because the tree you committed never existed in your working directory. The only way to know at that point is to checkout every intermediate state later (aka after the commit) and test it then.
You test "intermediate" commits the same way you test any other sort of commit. They are just commits, not something special.
You don't seem to be comprehending the divide between committing something, and actually giving that something to the world. In git it is perfectly natural to test after a commit. If it doesn't work, you just edit the commit. Of course testing after pushing should be treated delicately...
> You test "intermediate" commits the same way you test any other sort of commit. They are just commits, not something special.
An "intermediate" staged commit is special, as I have tried to explain many times now. If you stage a commit by using git-add to include only some of the changes in your working directory, you are committing a filesystem state that never existed in your working directory prior to committing it.
You can disagree that this matters, but you cannot disagree with the above statement. It is simply a fact.
Let's walk through a scenario. Suppose I am at the point in development where I actually want to create the commits that I will publicly push (ie. this is not a "commit early, commit often" scenario). All of my final changes are in my working directory, but for cleanliness I want to break them up into several commits by staging them.
$ git status
<git prints a bunch of "Changes not staged for commit">
$ git add foo.c bar.c
$ git status
<git prints foo.c and bar.c as "to be committed", baz.c
is still "not staged for commit">
$ git commit
At this point I have committed my changed foo.c and bar.c, but the commit does not reflect my changes to baz.c. That means that I have not actually tested that the tree state at this commit works. If I run my tests right now, they will not reflect whether the commit is broken or not, because my working tree still includes my uncommitted and unstaged changes to baz.c.
I do have a couple of options now. I can stash my changes to baz.c and run the tests; then "stash apply" them if the tests pass. This is probably the best option for ensuring that my staged commit is not broken. I can also commit my changes to baz.c, then checkout HEAD^ and run the tests, but this solution forces me to mentally keep track of which commits I have tested; if there are many commits in the series, this is burdensome.
Both of these solutions are entirely possible; I'm not denying this. The "stash" solution is probably the best option for this. But that said, it is fundamentally true that when you (partially) stage a commit, you can't test your commit prior to committing it. So when you write your commit message and mentally perform the process of deciding that you like this commit enough for public consumption, you don't actually know if your tree is broken or not. Even though there are workarounds, I still think this is a bit clumsy.
The point is that any test system that allows you to test commits will also allow you to test intermediate commits, since they literally are the same thing. A commit is a commit, it doesn't know if it is pointed to by a ref only, or if other commits point to it.
Making sure that you are testing with clean checkouts is important whether or not you are testing an "intermediate" commit. Hell, if you were using SVN, you could have ignored files that alter the behavior of the tests. Being wary of that sort of thing is important no matter what system you use, it isn't some special property of "intermediate" git commits.
> The point is that any test system that allows you to test commits will also allow you to test intermediate commits
Sure, but such a "test system" is not a part of Git. For people who are using Git simply, with a repo on GitHub or similar and no custom infrastructure, the only "test system" they have is running their own tests manually before they push.
> Making sure that you are testing with clean checkouts is important whether or not you are testing an "intermediate" commit.
Yes, but the functionality of partial staging is fundamentally opposed to this, because it explicitly creates a commit that did not come from a "clean checkout." However I am giving up on the hope that you (or anonymous downvoters) will acknowledge or accept this simple point.
I'm a Git fan actually, but it's tiresome to debate with people who can't see both the plusses and minuses of their tools.
But you can do broken commits just as easily with svn - I've seen people commit all changed files, forgetting to add a new file and making an unworking commit. You can as well commit any files you want and make as much of a mess as with git. Except with git you can test your commit before it's pushed to the server.
> But you can do broken commits just as easily with svn - I've seen people commit all changed files, forgetting to add a new file and making an unworking commit.
If you do that, you've made a mistake. No tool can save you from all mistakes (though they can help you out with warning messages and such).
Partially-staging a commit in Git is doing the same thing on purpose. It's functionality that explicitly helps you make this mistake.
if you don't trust yourself to make that judgement, try this workflow:
stage, commit, stash
now your working copy does match the state you committed, and you can test away to your hearts content. if you find a problem, fix it, and commit --amend. keep this cycle going till you're done.
stash pop, carry on working
almost every time i see a criticism of git it's about the way you use it not the tool itself.
>almost every time i see a criticism of git it's about the way you use it not the tool itself.
In my point of view, you cannot separate "the way you use the tool" with the "tool itself". A hammmer is easy and straightforward to use because of its design features. That is what makes the hammer useful. Granted that there are complicated tools, that require a steep learning curve (a pipe organ, for instance), but that should not be the case of git.
The whole point is that the tool should make it easy for you to do what you intend to do. That's the whole point of criticizing an application (or tool, as you put it). I am pretty sure that you can make beautiful and exact things with Git, but the fact is that sometimes they are difficult to perform or counter-intuitive, and that's the crux of the criticism.
A tool should not be designed only to "allow" people to do certain things. It should also make these things easy and straight forward.
It's impressive how (most of the times) our usage of a tool is directly linked to how it was designed. Therefore, design features (like the ones proposed in the article) cannot be distinguished from the "core" of the tool, or the functionalities it allows one to perform. The design, in some sort of way, is the tool. And that's what conditions our usage of it.
> The whole point is that the tool should make it easy for you to do what you intend to do. That's the whole point of criticizing an application (or tool, as you put it). I am pretty sure that you can make beautiful and exact things with Git, but the fact is that sometimes they are difficult to perform or counter-intuitive, and that's the crux of the criticism.
I'm not sure I agree. There are tools that are inherently difficult, because the problem they attempt to help with are inherently complex problems: architecture, MRIs, corporate taxation, managing pilot and crew schedules for airlines, etc.
Managing source code for any system of sufficient complexity falls squarely into this domain. Git tackles this -- nicely, I would argue. Among other needs, VCS's need to separate code changes into manageable chunks, store them in a compact manner, and be able to distribute those changes efficiently over a network.
Git handles these quite nicely. Separately, you would like developers to have the ability to commit changes in small, related chunks, all while simultaneously preventing conflicts -- or at least making them difficult. Git does this as well.
> A tool should not be designed only to "allow" people to do certain things. It should also make these things easy and straight forward.
Again, I'm not sure I agree with this premise for all cases. Tools should be as complicated as they need to be, and no more. The basic workflow behind git -- add, commit, pull/push -- is not overly complicated, and I must be honest in admitting that it puzzles me when it is otherwise claimed. Is it easy? Apparently not for some. My personal path was CVS, SVN, MKS, Perforce, then git, and it did not take me long to understand the benefits of git over the others I had used.
It was pretty straightforward. Different, but hardly intractable, especially for a tool which is so singularly important to me as a developer. In that case I do not mind complexity, given the flexibility that is gained and, frankly, since it's what I do for a living.
>In my point of view, you cannot separate "the way you use the tool" with the "tool itself".
Love this. The question then becomes do you change the too, the way you use it, or some of each.
>A tool should not be designed only to "allow" people to do certain things.
You lost me here, because that's the definition of design. There are many hammer designs, some more general purpose like the claw hammer and many others (tack, 2lb) with a more limited intended purpose. Git is designed for very large scale teams, and staging seems to be integral to that. The paper asserts the complexity of staging is unneeded, but in my view doesn't adequately demonstrate that to be true for large teams. The paper goes on to describe an alternate, simpler product which better meets the needs of simpler users using and alternate, incompatible conceptual design. This is fine, but I'd be more impressed if it started by fully demonstrating a concept is not needed for the intended users before removing that concept and then described a pathway from a simpler concept to the fully complex conceptual model.
a fair point, i'd argue that it's a bug in stash's behaviour that disallows this. personally i've had experience with stash working ok in this scenario often in the field, so perhaps only certain split files cause the issue. most likely when the divergence of the stash and the commit are in very close proximity in the file.
worth noting here is that stash behaviour has improved a lot in more recent versions, e.g. stash pop can merge with your working tree now, whereas a while back it would just fail to apply and you'd be stuck having to do a patch and apply instead.
i think the workflow i outlined is intended to work correctly and eventually will for all cases rather than just the majority. hopefully that doesn't scare people away.
out of interest do you know of any other way to do a similar kind of thing with any other tool? i find having to deal with these edge cases still way better than the alternative of having no staging area and only being able to work on 1 thing at a time personally.
that is an argument that can be resolved - hereby yielding staging a nice way of creating clean commits. The developper should simply build/unittest at least the project(s) directly involved in the commit. Apart from that, the CI system should take care of the rest. es if you consider git as a standalone tool your argument makes sense, but how often is git used like that, and not as a part of a complete build/test system?
I am talking about one of two typical situations: either you want to commit all your changes to a single branch, but in more than one commit, or, you want to distribute your working directory changes between two different branches. In the first case the problem does not occur, in the second one you can run the tests on both branches after doing the commits, and before pushing rebase if necessary, or you can use git stash --keep-index, and good unit test coverage and CI infrastructure, a good idea regardless, should also help to stay out of trouble. If you keep in your working directory serious changes that are not meant to be comitted at all but used during tests, then Git can't be blamed for that I guess.
Why? You can stage the changes you want to commit, and stash the unstaged changes. Run tests, &c. Commit locally but do not push. If you test more and it's still broken, you can iterate on that with git commit --amend. The rest of your change will happily live on in the stash.
As others have pointed out, nobody sees your change because it's still only local. That means you could make 10 tiny commits as you work. Then, later on, rebase and squelch a series of smaller commits into a bigger one, and then push, send a pull request, whatever.
> You can stage the changes you want to commit, and stash the unstaged changes.
Yes, I have acknowledged that this is a solution to the problem (https://news.ycombinator.com/item?id=6986000). It seems a bit roundabout and clumsy to me though, in the sense that it requires vigilance to do the right thing. Things that require vigilance are easy to forget or do incorrectly.
I haven't looked at Gitless yet, but I hope that it makes this a bit more streamlined.
> squelch a series of smaller commits into a bigger one
People aren't really good in remembering about those things up front, that's why Git introduces the staging area, so that you can work as usual and only after you are finished with whatever was occupying your mind you can consider splitting your work into nice commits, which can be quite a task in itself to do right. If you remove the staging area, and want to incrementally build up a few commits from the changes you made, you end up having to pass in again and again a lot of parameters to the commands executed, first to git diff, then to git commit, and it's easy to do a mistake and diff something other than the changes that will actually be comitted in the end.
A lot of people, especially in small teams, use version control very sloppily, and then get confused about conflicts, changes are hard to track down in history etc. Remember that Git was build for maintaining Linux, which has an absolutely huge number of people working in parallel - in this case you really, really have to care about using the VCS tidily, actually understanding its concepts very well, and not just churn away commits, or you will just fail to integrate the changes correctly. So, frankly, while I like the general discussion in this paper and its approach, it seems to me a bit confused with respect to Git, I wonder whether the authors have any experience in doing long-term software development in a team and especially doing software integration and in using a VCS for that purpose. Once you have a few people, or more than one team working on a project, a few testing servers, and a few different release branches, the concepts in Git do make a lot of sense.