Conversational-Git Can Print
Conversational-Git Can Print
Alan Hohn
October 19, 2013
Conversational Git
Chapter 1
Introduction
These friends are smart people, and if they’re not convinced about Git, the
problem is not them; it’s that they haven’t seen the right argument yet.
There’s so much content out there about Git, and much of it is written at a
level that’s way higher than my expertise. But in a way, that’s an issue.
When you’re first starting out learning something, the questions that you
have are way different from the questions an experienced person has. Once
you’ve won that knowledge, it’s almost impossible to go back and think
about what it was like when you were first learning. That puts you in a bad
position to explain to someone else who’s brand new.
Git seems particularly prone to this because it’s based on some pretty
complex notions of how to think about version control. In particular, once
you internalize the concept of the Directed Acyclic Graph (DAG) that
underlies basically everything in Git, you tend to want to explain that to
new people because (a) it can help you think about how Git works; and (b)
it’s cool. Unfortunately, teaching Git from a DAG perspective is IMHO the
worst way to teach it to new users because it suggests to them that they
have to thoroughly understand complex concepts from graph theory to use
Git effectively. There’s also no question that the Git help pages use Git-
specific jargon, which really interferes with non-experts understanding what
a command does.
Style and Motivation
I’m hoping in this book to adopt a style that will be accessible to new users.
I’m writing in an informal style, with plenty of first- and second-person
references. This is not a “dummies” book; I’m not going to talk down to
you, and I’m not going to suggest that you shouldn’t learn complex
concepts about Git. But I’m going to try to talk about how I use it and how I
see it being used effectively.
I’m making a few assumptions about readers. The first is that readers know
in general what version control tools are for, and therefore what Git is for.
The second is that a reader of this book is familiar with Subversion or CVS,
and is interested in knowing how a “distributed” version control system,
and Git in particular, is different. If those assumptions don’t apply to you, I
hope you still get value, but you might not get as much.
I’m calling this book “Conversational Git” both because I’m aiming for a
conversational style and because, when learning a new language, a key goal
is to be “conversational” – able to make basic small talk, even if not quite a
native speaker.
Because I wrote this book in a conversational style, it’s verbose (like me!)
and breezy. So I hope it’s a quick read. I do include a bunch of Git
commands in here. If you choose to follow along running those commands,
you’ll need to be consistent because some later things are based on some
earlier things. However, when I read tutorials or books like this, I hate
having to follow along typing commands, so I tried to write in such a way
that you can just “get” what the commands are doing from the context. So
don’t feel like you have to follow along to follow along, if you get my
meaning.
For the same reason, the ratio of Git commands to text is somewhat lower
than a typical tutorial. I’d rather spend a paragraph explaining or providing
motivation for a one-line command than present all the possible switches
for all the Git commands. That’s a different book, and Scott Chacon has
done a much better job writing that one than I could anyway.
Why Not
I’m not writing this book to argue against Subversion in favor of Git. Like I
said, I’ve used Subversion heavily for many years, and I still advocate for it
when people are looking for version control tools. Where I compare
Subversion and Git, it’s an attempt to discuss tradeoffs from the perspective
of someone who likes both tools but tends to use Git by preference.
I’m also not writing this book to advocate for Git versus Mercurial or
Bazaar. I’m not qualified to write that book.
Finally, I am not writing this book to refute people’s complaints about Git.
In fact, one of the reasons I wanted to write it is because of Steve Bennett’s
10 things I hate about Git, because I agree with him! Using Git is not pain-
free; I just happen to think it’s worth it.
Dogfooded
We’ll start the next chapter momentarily, but first, I want to point out that
this book is dogfooded. It’s hosted as a Git repository on GitHub. So you
can fork that repository and get your own copy of this book to modify. If
you make changes, you can send me a pull request so I can merge your
changes into my version. That whole workflow is an essential part of why
Git has become so popular for open-source projects, and a key purpose of
this book is explaining that workflow and why it’s so powerful.
The book is written using Markdown and processed using Jekyll. Much
love to both those technologies and to GitHub Pages.
Chapter 2
Setup and Committing
Setting up Git
I’m not going to spend a lot of time talking about installing Git. The
audience for this book is familiar with installing software and using a
command line, and Git is sufficiently available on various operating
systems that I’ll just assume that git --version on the command line
doesn’t return command not found. I’m also going to assume UNIX syntax
for other commands.
I’m also not going to talk about GUIs. There are great GUIs out there for
Git, and of course support is built into IDEs. But the concepts are the same
and the command line is clearer for learning purposes.
One setup item that is important is telling Git who we are. Git keeps
permission to modify a repository (which might be a user name and
password or an SSH key) separate from the information about who made
the change. If we don’t configure it, Git will take that “author” information
from our current username. That works but it’s more polite to others to
make it accurate.
This creates a plain text file called .gitconfig in your home directory, with
a couple lines in it.
Creating a repository
In a scratch directory, type:
git init repo
Assuming the directory repo didn’t exist, this makes it and puts a directory
in it called .git with a bunch of files. This .git directory is the repository;
we can move that directory anywhere else we want on the file system and
that new location will become a Git repository by virtue of having that .git
directory in it.
The .git directory has a file in it called config. This file works just like the
.gitconfig in our home directory, but it applies only to this repository. You
can, for example, have a different name and email address for a given
repository.
Other than the .git directory, the rest of the content is ours to play with.
This is our “working copy”, and the concept is pretty much the same as
Subversion.
One other important Git configuration file is .gitignore. It’s a regular file,
and you can have it at any level of the repository. Each line in the file is a
pattern. As long as the pattern doesn’t start with / it applies recursively.
Committing
For the purpose of this book, all the changes we make will be small and
silly. We may as well begin that way:
cd repo
echo "Hi there" > README
git add README
git commit -m "Initial"
For me, this is the first thing about Git that I miss when I use a different
tool: “What do you mean I need to be connected to a server to commit
changes?” This also presents an important “thinking difference” with Git.
Your commits should be really, really small. There should be one commit
for each independent idea in whatever work you’re doing. Do I obey that
rule? No, I don’t; look at my logs and see for yourself. But it’s a good rule.
There are lots of other ways to organize things in Git, so we don’t have to
try to fit whole features into a single commit.
Staging
In fact, because Git wants your commits to be small, it behaves exactly
opposite from Subversion. In Subversion, if you make changes to a
repository and type svn commit, by default it will pick up all the changed
files and assume you want to commit them. Git instead wants you to
“stage” a file to show that you want it included in the commit. Nothing is
staged for you automatically. Unfortunately, staging a file or files uses git
add which kind of overloads that term, especially for Subversion users.
Nothing happens here except that Git tells us that we have changes that are
not staged as well as an untracked file. This is irritating, there’s no way to
pretend it isn’t. Git itself tells us in the message that we can get around this
using git commit -a, but that won’t pick up the untracked file!
Since I’m writing a book about Git, I have to find a way to defend this
behavior. The key thing to remember is that commits are supposed to be
small. The Git folks realize that in the real world, a developer works on
several things at once, files get changed, odd files are inadvertently created.
Git is biased toward explicit behavior when it comes to choosing what files
to commit, so we need to tell it exactly what files belong in this commit. We
might do it like this:
git add .
git commit -m "README and content"
Note the period at the end of the git add command, telling Git to add any
new and changed files in the current directory. The git add command is
recursive, so this would also cover subdirectories.
Removing files
One more caveat: while this picks up added and modified files, it doesn’t
pick up missing files (files that are being tracked by Git but don’t appear
any longer in the working copy). This is scheduled to change in Git 2.0, but
for now there’s a little extra work.
The command git rm is designed to stage a file for removal. It will also
remove the file from our working copy. If there are lots of deleted files, and
we deleted them with regular “rm” rather than the Git version, it’s a pain to
do git rm with each one, so there’s one more version of the add command
to help us:
Again we find that nothing happens because those changes weren’t staged.
We can’t do git rm bad* because those files are already gone from the
working copy. But we can do this:
git add -A
git commit -m "Remove bad files"
In Git 2.0, this staging of removed files will be the default and git add .
will work for both cases.
Wrapping Up
The idea with staging is that Git expects us to identify explicitly what files
we want it to commit, by “staging” them. When we stage a file, Git actually
makes a copy of it in its own space in .git, and that’s what will be
committed. The idea here is that as you work, you can add changes to a
commit, and then finalize it with a message when you’re ready.
So far we’ve used Git as a single-person version control tool, but of course
in the real-world it’s for collaboration by a team. This means we’ve got to
start using Git commands that are designed to move commits between
repositories. That’s for the next chapter.
Chapter 3
Clone, Push and Pull
Multiple Repositories
In Subversion, the only way we work with multiple repositories is through a
mirror, which has to have exactly the same history to work correctly. Git is
designed around multiple repositories; to help work on a project, you have
to have not just a working copy but a “clone” of the whole repository so
you can commit, push, and pull.
Anyway, to work with multiple repositories, Git doesn’t care where they are
as long as it can get to them. Directories on a file system work just fine.
Bare Repositories
One thing I am forced to talk about is a “bare” repository. This is just a
repository with no working copy. We’re going to be using “push” to send
commits to a “remote” repository, and Git wants to make sure that we don’t
mess up the remote repository’s working copy by doing that. So by default,
it’s going to reject any push that isn’t targeted at a bare repository. Server
repositories are always bare, so you won’t see this issue in real life.
I could make you create a new bare repository, but then we wouldn’t have
the stuff from last chapter. Instead, a quick workaround; we’ll make a bare
repository based on the non-bare one we created last time. This workaround
also teaches an important command, clone.
Assuming we start back in our scratch directory (so if we’re still in repo,
we need to do cd ..), do this:
From here on out I’ll ignore repo and just talk about shared. Keep it
around, though; we may do something neat with it later in the book.
Cloning
Last time we started with a brand new project, but only one person gets to
make that first repository. Everyone else needs to clone it. This is important
because it gets the whole history of changes that have been made, so when
we make new changes and commit them, other people can apply them
easily.
We’ll assume that now there are going to be two imaginary friends helping
us on this project; call them Harry and Isabelle. They each need their own
space to work.
shared is the “bare” repository we made a minute ago, and it has all the
work we did in the last chapter. With these commands, we’re telling Git to
copy that repository to a new one called harry (and another new one called
isabelle). You can see for yourself that each of these new directories has a
.git directory and each directory has the latest content in its working copy.
In fact, we could now throw away the shared directory and continue
committing to one or the other of these new repositories (and even pull
content between them directly). But to be more realistic, we’ll pretend that
shared is the single shared repository that both Harry and Isabelle can see.
One other note: since I mentioned git log, which as expected shows
history, I should also mention git status. For the following chapters, you
won’t see git status in the Git command stream because it would have
broken up the flow of what I’m showing. That doesn’t mean it’s not
important. It tells you all kinds of useful things, including what files you
have that need to be committed and where you stand with respect to other
repositories that you’re talking to.
Sharing Commits
Harry wants to make some changes:
cd harry
echo "Adding some more content" >> content01
echo "Second content" > content02
git add .
git commit -m "Harry content"
At this point, git log in harry will look different from git log in shared
or isabelle. Not only the working copy in Harry’s directory, but his
repository also is different from the others. In order for Isabelle to see
Harry’s changes, he could send her the commit directly (via an email with a
patch or something), but it’s much better for him to just push it to their
shared repository.
git push
Now git log in harry and shared look exactly the same, but git log in
Isabelle’s repository is out of date. She needs to get those changes. For
once, the command makes sense; it’s the opposite of “push”, so it must be
“pull”.
cd ../isabelle
git pull
(Note for the pedantic: here is where I am not going to talk about the DAG.
Nor am I going to talk about how “pull” is “fetch” plus “merge” and what
each of those does. Nor will I talk about local and remote references, or the
difference between “master” and “origin/master”. Save that stuff for when
we need it, which is very rarely if we use Git the way most people use it.)
Wrapping Up
So far this looks a lot like Subversion. Sure, we saved the change to a local
repository first, and we called that a commit, but really the “push” was a lot
like a Subversion commit, and “git pull” was a lot like “svn up”.
Of course, this is true! At the end of the day, we’re still managing files and
the changes that we make to those files over time. But as we go along, we’ll
get into situations where the behavior is a little different from Subversion,
so it’s important to recognize that something “happened” when we did git
commit, not just when we did git push and git pull.
Next time we’ll get into what happens in the far more likely case that Harry
and Isabelle need to change things at the same time.
Chapter 4
Simultaneous changes
Teamwork
Of course, we could continue working like the previous chapter. Harry or
Isabelle makes a change, pushes it, the other pulls it, and everyone is kept
up to date. But in a real environment, Harry and Isabelle are going to be
making changes at the same time.
cd harry
echo "Third content" > content03
git add .
git commit -m "Third"
And meanwhile…
cd ../isabelle
echo "More second" >> content02
git add .
git commit -m "Second"
At this point, git log for Harry and Isabelle is different, and neither has
pushed their change to shared. Whoever pushes first will “win” in the sense
that the other will be responsible for dealing with the effects.
git push
cd ../harry
git push
Back comes a message telling Harry that the push failed. Be grateful; in
modern versions of Git there’s a nice “hint” that describes the problem very
clearly. Before, we got cryptic error messages and we walked through the
snow, uphill both ways, to get them.
This error isn’t really that bad of a problem; Harry just has to “pull” first.
git pull
Merging in Git
If you’ve used version control, you’re used to merging other people’s
changes. You make your working copy look like it should to be the “next”
version, and then you commit. We have the same goal with Git; the idea is
to make the working copy look like it should, incorporating everyone’s
changes.
But there is one important difference that comes from Git being distributed:
we are merging commits, not just changes to files! So in one sense, even
though I haven’t discussed branches yet, it’s like Harry just created a
(nameless) branch, because now his commit history differs from isabelle
and shared. (Isabelle got to shared first, so her commit history is the
“official” one.)
Merging commits seems more complicated at first. But it’s one of the most
important, powerful, and valuable features of Git. First, it allows us to
commit changes even if we’re not connected to the server. Second, it’s
actually much safer, because both Harry and Isabelle’s changes are saved
and can be accessed separately forever. So if the merge is done wrong, we
can more easily go back and figure out why and fix it.
Harry can’t continue with the history in his local repository differing from
the “official” history; it’s interfering with his ability to bring in work from
Isabelle. To sort things out, Git requires us to create a “merge commit”.
This is a commit that combines Harry’s commits with the one he
downloaded from the server.
For this example, you probably noticed that I intentionally made changes
that were compatible and would not cause a conflict. As a result, we don’t
have to do any conflict resolution. That particular pain is coming in the next
chapter.
Successful Push
After Harry exits the editor, Git will have its one accepted sequence of
commits (see git log for yourself to find out what that looks like) and
Harry can push his changes to the server.
git push
When Isabelle does a pull, she will get Harry’s new changes:
cd ../isabelle
git pull
And at this point everyone’s git log is the same again. Of course, if
Isabelle had made commits in the meantime, she would have to pull before
pushing, and so on.
Later we’ll talk about feature branches, which are a much more elegant way
to deal with routine and regular merging. When a feature branch is merged
back into the “main line”, there will still be a “merge commit”, but only one
for the whole feature.
Wrapping Up
At this point, this still looks a lot like Subversion at the surface. When I had
new folks starting out with Subversion, I used to use this (approximate)
rhyme, which I thought was clever but which never caught on:
Still this lesson you must get;
First you update, then you commit.
With Subversion, that merging occurs when you have uncommitted changes
in your working tree, and there’s no convenient way to store them so they
don’t get damaged during the merge. (Of course, you can branch, but that’s
also a server-side activity and a little bit of a pain given how frequently it
happens.)
With Git, those files are safely committed; you never have to merge into a
modified working copy.
In the next chapter, we’ll look at what happens in the inevitable case where
the changes happen at the same time, to the same files.
Chapter 5
Constructive Conflict
A Conflict
This time, Harry and Isabelle both decide to add a line at the end of the
same file. Starting from the scratch directory again:
cd harry
echo "Harry's line" >> content01
git commit -am "Harry"
git push
cd ../isabelle
echo "Isabelle's line" >> content01
git commit -am "Isabelle"
git push
Back in the olden days before electricity, we did this with git reset --
hard HEAD. It still worked. You will read in Git tutorials that the --hard
option to HEAD can be unsafe because it modifies the working copy. This
is not an issue for us, because we are smart enough to always pull only
when all of our changes have been committed (or stashed). That’s a good
rule to follow, because while git merge --abort will try to put your
working copy back the way it was, and generally will succeed, if you’ve
committed all of your changes you’re guaranteed to be able to get them
back.
I haven’t introduced git reset yet. Why? Because Harry and Isabelle
haven’t made any mistakes. They’re very good at their jobs. But they’ll
make some eventually.
git pull
again, and gets the conflict back, and this time she edits the file content01
to resolve the conflict.
After that she can just:
git commit -a
git push
Note that in this case Isabelle didn’t use the -m option to provide a message.
This causes Git to pop up an editor with a merge message already built-in,
plus a list of the files that were in conflict. That information then becomes
part of the log.
At least for me, I use the -m switch to git commit about 0.1% of the time.
It’s just as fast to type a message into the editor, plus in the editor view Git
gives you one last chance to verify that the right stuff is being committed. If
you don’t like what you see, you can simply exit the editor without saving
and the commit will be canceled.
Git Stash
Unfortunately, life isn’t always easy. Isabelle might have been in the middle
of a change when Harry really needed her to pull his changes and look at
them. A commit in Git is small and cheap, so it would be OK for Isabelle to
just commit her changes anyway and then come back and fix them later.
But Isabelle has another option.
cd ../harry
git pull
echo "Third down and 10" >> content03
git commit -am "Down and distance"
git push
cd ../isabelle
echo "Third, Third Third" >> content03
At this point Isabelle wants to pull in Harry’s change, presumably to run it.
(I know, we aren’t writing code, but pretend.) If she commits her change
first, she’ll have to resolve the conflict before she gets a clean working copy
with Harry’s changes. So instead:
git stash
git pull
The changes come in cleanly. After Isabelle is done and is ready to get her
changes back, she does:
At this point she has to deal with the conflict, by editing the file as above
and then committing the change.
Yes, the reference to “pop” means that the stash is a stack, and yes, that
means it’s possible to have more than one commit on it. However, it’s not a
good idea. In fact, while git stash is a good thing to know, feature
branches typically make it unnecessary under normal circumstances.
Wrapping Up
If you’ve been following along perfectly (congratulations) you just need to
edit content03 in Isabelle’s directory to remove the conflict tags, then do:
git add .
git commit -m "Resolve"
git push
cd ../harry
git pull
However, if you haven’t been following as closely but want to catch up, it’s
entirely possible that you’ve got some conflicts or commits in either Harry’s
or Isabelle’s repository that haven’t been resolved yet. Before we go on,
you’ll want to make sure both sides have no remaining uncommitted
changes and are up-to-date with each other. The file content doesn’t matter
as I’ll avoid making any assumptions about how the conflicts were
resolved.
Neither the workflow nor the capabilities in this chapter are different in Git
from how they are in Subversion. Resolving conflicting changes is painful
no matter what. As we work through subsequent chapters I hope to
demonstrate some ways that successful Git projects use Git to control the
pain of merging. At the very least, they are successful at putting that pain in
a box so that they can decide when to experience it. They do this by getting
away from making all changes on a single branch; in fact, most successful
Git projects are extremely branch-happy.
Yes, the chapter title puns seem to be getting worse as we go. Not really
something I’m in control of.
At the same time, on my last big project using Subversion, we didn’t really
find ourselves branching all that much. For the most part, a branch came
about when it was time for a release, and we didn’t typically commit many
changes on that branch afterwards. (I know, that sounds more like a tag, but
(a) Subversion doesn’t care about “branch” versus “tag” semantic
squabbles; and (b) we did commit on the branch sometimes.)
I worked for a while on a project that used Subversion for something called
“feature branches”. The whole project was organized using a ticket system,
and before you worked a ticket, you made a branch with that ticket number
as the name, worked all your changes in there, and notified someone when
you were done. It was good because the senior developers got to look over
code changes and often had good suggestions for you to implement before
they agreed to merge your change into the trunk. Peons like me didn’t ever
commit to trunk.
What’s interesting is that this happened years before I ever used Git for the
first time, but for me it was the natural, obvious, “of course you do it that
way” workflow. Production came from trunk, so before you mess it up, you
do everything you can to make sure your changes are good changes.
This is the Git workflow! All the stuff about feature branches and pull
requests is all based on the idea that in software development, we group
related changes into features and we work those features in parallel, either
as a team or individually. Like I said before, in Git we want commits to be
as small as possible, so we use branches to group related commits into a
feature so they can land in a product all at once.
One last Subversion note: even at the time, there was a way to do that
merge and not break things the way I did. And later versions of Subversion
add merge tracking that seems to work really well. But I think most people
would agree that feature branches are handled amazingly well in Git
compared to other tools.
cd harry
git checkout -b shakespeare
It looks a little strange to use git checkout to make a new branch, but no
stranger than using svn copy. I’m showing git checkout -b even though
there is a git branch command that will do it, because git branch doesn’t
switch to the new branch, and when you’re using feature branches you
don’t need the hassle of remembering to switch before you commit changes.
Note that this command is a lot different from branching in Subversion
using svn copy. In particular, we didn’t have to specify a “remote” URL.
This branch is totally local to Harry’s Git repository. If you do git branch
in Harry’s repository, you’ll see it, but if you do git branch in shared or
isabelle you won’t.
This new branch is based off the latest work that Harry pulled. Harry can do
all the work he wants here. Anything he commits will affect the
shakespeare branch only, and will not affect the original branch (which is
called master).
At this point git log will show those two new commits, because Harry is
still working in the feature branch.
Keeping Up To Date
In the meantime, Isabelle has been working on other changes. She hasn’t
learned about feature branches yet:
cd ../isabelle
echo "This is no ordinary line" > content04
git add .
git commit -m "Fourth"
git push
Isabelle pushed her changes, and Harry wants them. However, he doesn’t
want them in his feature branch; that branch is just for Shakespeare. So he
switches back to master:
cd ../harry
git checkout master
git pull
If you’re following along, you’ll see that Git reports this as a “fast-
forward”. No merge is taking place here. Also, you’ll notice that git log
does not show our two Shakespeare related commits in the log, and the new
spear01 file is not in the working copy.
Merging a Feature
Assuming Harry is done with Shakespeare for now, he’ll merge those
changes in from the feature branch. He’s already in master, which is where
he wants to be to merge in changes.
Git brings up the editor to let us make a merge commit, and once we save
and exit the editor window, the change happens. All the stuff I said
previously about handling conflicts, aborting merges, all that stuff applies
here as well.
git push
cd ../isabelle
git pull
cd ..
Now Isabelle has the changes too. She does not, however, have a copy of
the shakespeare branch; she only gets the new file spear01 because it was
merged into master. (She does have a master branch, of course. It got
created when we cloned the shared repository for her.)
Wrapping Up
This was a really basic feature branch, and I didn’t show most of the best
reasons why you might want to use one. Fortunately I’m using the natural
numbers for these chapters, so I’m not likely to run out.
Even though this was a really basic feature branch, I don’t want anyone to
lose sight of what we did here. Without ever using a remote server at all, we
created a branch, committed some changes to it, and merged it back into the
main line. Even that basic capability is enough to change the way that a
developer works when they’re working multiple tasks at the same time
(which of course is most of the time). A feature branch represents freedom
from worrying about leaving the codebase in a broken state while you’re
implementing something complex or risky. It also provides a quick way to
context switch when you’re working multiple things. These benefits of
feature branches exist whether or not you’re using Git, but the ability to
make a feature branch while working disconnected is not something to be
taken lightly.
Chapter 7
Remote Branches
This time, Harry is going to do some more Shakespeare quotes, but Isabelle
is going to help. We still don’t want the changes to hit the main line until
they’re ready, so they’re going to work together on a feature branch.
Harry merged the previous changes in his shakespeare branch, but the
branch itself is still around and he can just switch to it. (Different teams
have different rules about keeping old feature branches around.)
cd harry
git checkout shakespeare
echo "Can the world buy such a jewel?" > spear02
git add .
git commit -m "Claudio"
This is the same as what we did last time, but at this point, Harry wants to
back up his change and also let Isabelle work with him.
Because Git is fully distributed, it needs to allow for the cases where a
repository has lots of different upstream repositories. If we were helping to
develop Linux, we might have branches for Linus’ main line, the current
Red Hat main line, and many others. When we make a new branch, we need
to tell Git which upstream repository this branch belongs to.
If while we’re switched over to shakespeare, we do “git push”, Git will tell
us what I just said, but shorter. It will also tell us how to fix it. When we
clone from a repository, that repository is automatically called “origin”. So
the first time we push a new feature branch, we have a long command:
This tells Git to send the branch to “origin”, which is the label Git uses to
refer to shared. It tells it that the feature branch should be called
shakespeare on “origin” as well. From this point forward, regular git
push will work fine.
The shakespeare branch exists on shared now, and we can pull changes to
it just like we did before with our master branch.
cd ../isabelle
git pull
This is all that Isabelle has to do to get any new feature branches that have
been pushed to shared. She can now switch to that branch and Git will be
smart enough to make a local version of it:
Note that because Isabelle got her branch from “origin”, Git already knows
where to push it.
Either Harry or Isabelle can merge the changes back into master when
they’re done adding Shakespeare quotes:
Wrapping Up
It’s not immediately apparent, but Git did something a little clever here. In
the last chapter, we merged a couple Harry Shakespeare commits. (Harry is
Bill Shakespeare’s direct patrilinear descendent, but don’t ask him about it,
because then he won’t shut up.)
In this chapter, we used the same feature branch to make a couple new
commits and then merged them. Because Git stores the parent of each
commit, it can walk back through that history and notice that some of the
commits from the shakespeare branch have already been merged. It
doesn’t try to merge those again, which is good because it would find
spurious “conflicts”. (Before Subversion had merge tracking it was
painfully easy to make it create those spurious conflicts.)
We’ve still got at least one more chapter on feature branches, because
we’ve been working again with the “happy path” where no one gets in
another person’s way. We need to look at more realistic cases.
Chapter 8
Conflicts are a Feature, Not a Bug
cd harry
git checkout master
git pull
cd ../isabelle
git checkout master
git pull
cd ../harry
git checkout -b julius
echo "The fault, dear Brutus, is not in our stars" > spear03
git add .
git commit -m "Cassius and Brutus"
git push --set-upstream origin julius
Even though they both created the same files, both are allowed to push to
the shared repository because their changes are on feature branches. This is
another important point about feature branches that makes them worth the
trouble, even when the change is relatively small. They allow you to choose
when to incur the pain of merging, and in the meantime allow you to work
in peace.
So far so good; everything merges without incident. You’ll notice that the
syntax changed a little; this is Git forcing us to be explicit. Because Harry’s
never worked with the hamlet branch, he doesn’t really have a local hamlet
branch, he just has the one he downloaded from shared. If you were to just
say git merge hamlet Git provides a hint letting you know what to do.
That doesn’t work, because it’s trying to add a file that was added in a
different commit, so we have a conflict.
The point is that when we’re merging into master, the merge should be
clean, because that increases the chance that we’re going to get the change
that we want. So instead of fixing the conflict in master, Harry is going to
fix the conflict in the feature branch. This works in Git because Git does a
really good job of keeping track of what commits have been merged into
what branch.
We get the same conflict, but from the other side – now we’re merging the
Hamlet quote into the feature branch.
Wrapping Up
At this point, I’ve introduced the basic workflow for Git, including feature
branches. There are lots more Git commands, and I probably need a few
more chapters to cover some other things, but the vast majority of the time
these commands are the only Git commands I use (including git log and
git status that I mentioned earlier).
Is the Git approach more complex in some ways than similar functions with
Subversion? Definitely. In some ways they’re very similar; if you use
feature branches, you have to think about when to make them and when to
merge them, and you have to remember what branch you’re on before you
commit code. But Git has the extra steps of committing to a local
repository.
The benefit you get for that, in addition to being able to work disconnected,
is that merging becomes much more about combining commits rather than
combining changes into files in the working copy. Not only is that safer and
easier to abort, it makes it easier for Git to track what’s been merged so you
can merge from any angle and get intelligent results.
Next chapter we’ll talk about what happens when it all goes wrong;
recovering from mistakes.
Chapter 9
Be Not Led Astray
That’s very good advice for Git as well. It’s easy to get hung up trying to
find a clever Git solution. Especially because Git is so complicated and
powerful, there are ways to go back and rewrite history. But to paraphrase
Jurassic Park, just because you can doesn’t mean you should.
cd harry
echo "I broke this file" > spear01
git checkout spear01
If you made a typo in a commit message and you haven’t pushed it yet, you
can use git commit --amend:
As I mentioned before, we can leave off the commit message and Git will
launch an editor.
If we did a commit now, the temp file would get committed. (This, by the
way, is why I avoid using git commit -m in real life; better to let Git
launch an editor and review what will be committed.)
If we had just said git reset without parameters both our changes would
have been unstaged.
Note that even though we did git add . again, temp01 didn’t get picked up
this time. The addition to .gitignore takes effect immediately, even before
we commit it.
That’s not always easy; you may have destroyed the correct version of the
file and want to get it back. If you’re using GitHub or some other on-line
tool, you can navigate history and copy/paste. But that’s a cop out.
To reach back into history, we need to tell Git what commit we’re interested
in. With all the branching, merging, pushing, and pulling, Git doesn’t have a
single authoritative place to store commits, so it can’t number them from 1
to n like Subversion. Instead it uses a hash. You can see that hash in git
log and you can use it as a unique way to refer to a commit; just copy/paste
or type the first few characters and Git will know which one you mean.
But usually when we need to reach back into history, we just want to refer
to “the commit before the last one” or “two commits ago”. Git uses the term
HEAD to refer to the most recent commit, and HEAD@{1} to the one before
that, and so on. So we don’t need to go look up the hash in the history to get
to it.
A simple example:
Wrapping Up
So far we’ve listed ways to fix most things we could do wrong. There are
lots of ways in Git to do exactly what I did here, but I like these ways better
because they’re the least intrusive way to do it; for the most part we just use
Git to get back the content we want, then make a new commit to fix the
repository. This is good because there’s less chance of error and because,
except for git commit --amend, these commits are safe even if we already
pushed the bad change.
When we start working with branches, there’s a couple more ways things
could go wrong. The techniques provided here will get us out of a lot of
those situations, too, but they may not be the most elegant way to get out.
Next chapter I’ll talk about better ways to handle those issues.
Chapter 10
Serious Issues
The most important thing with Git is when this happens, don’t panic, and
don’t push. Anything can be fixed, but it’ll be fixed a lot easier if it hasn’t
been pushed yet.
If you just need to redo a commit, with a different branch as target, it’s
pretty easy.
cd harry
git checkout -b much-ado
echo "Were she other than as she is, she were unhandsome" >>
spear02
git add .
git commit -m "Benedick"
git checkout master
echo "But being no other but as she is, I do not like her" >>
spear03
git commit -am "Benedick continues"
We switched back to master probably for some good reason. Then we
forgot we switched and went back to making commits that belong on our
feature branch.
In this case, we don’t want that commit to apply to master at all. We need
to rewind master to the point before that commit, but in a way that keeps
the change in our working copy so we can apply it to the branch.
We used reset rather than checkout this time. Last time we were content to
just make a new commit after undoing the bad stuff. This time we want that
commit to have never happened, because it would be confusing to people to
see a “Much Ado” commit on master before that feature branch got merged
in.
We also used --soft, which wasn’t strictly necessary, but it’s a nice touch
because it leaves our “staged” changes. This means we can redo the commit
without having to worry about doing git add or git commit -a. Don’t
worry about this kind of touch while you’re learning Git; it’s the kind of
thing that comes naturally over time. The more mistakes you make, the
faster you get to learn the different ways to fix them.
One other thing: here I used yet another way to refer to “the commit just
before the last one”. It’s exactly the same as HEAD@{1} but I wanted to show
you both because you’ll see them both.
Now that we’ve backed out the commit, we can switch branches and
commit it where it belongs.
Merging in Traffic
You probably don’t care, but what happened to the original “Benedick
continues” commit, the one that we committed to master and then backed
out? It didn’t go anywhere. Really, all we did was just change the history
for master so it no longer included that commit. That commit is still
floating out there but it no longer belongs to any branch.
Let’s say that Harry thinks he wants to merge his feature branch into
master:
At this point the marketing guy shows up and tells him that the feature
needs to wait until version 2.0 because they plan a price increase then and
need to justify it. Harry needs to get that feature out of master. He doesn’t
remember how many commits were in that feature branch, so he does git
log and finds the last “good” commit. In my repository, that’s
391590ed0605807042eb0dbd0eb9054396a5ec1a; you’ll have to look up
your own.
It makes sense to use --hard here because we want Git to also update the
working copy. It’s safe because those commits are stil available on the
feature branch. In fact, if the marketing guy were to show up and say he just
remembered that he promised that feature in the next release after all, Harry
could just git merge much-ado again and everything would be back where
it was a moment ago.
What If I Pushed?
That example worked because Harry had not pushed the change to shared
yet. But that’s not very realistic. I still maintain that the best solution is just
to get the files back to the right state and make a new commit. But in some
particular, probably rare situations, that might not be preferred. What about
cases where someone committed personal information to the repository? It’s
not OK to just leave that sitting around in an old version.
Now we need to push this change to shared, but it’s not a regular fast-
forward any more, so Git will reject it. This is one case where it’s justified
to do a “forced” push:
git push -f
You can believe me that the right thing happened, or you can cd over to
Isabelle’s repository, make sure master is checked out, and git pull. The
latest “Much Ado” commits won’t be brought in and they won’t appear in
the log.
Wrapping Up
The stuff in this chapter is the Git equivalent of surgery, and it should be as
rare as surgery. Even though the commands are short, these changes were
relatively complicated to envision. However, these kinds of fixes are
complicated in any version control software, and they’d be practically
impossible in some tools I’ve used. Git lets you do this, but as I said before,
it doesn’t mean you should.
Those familiar with Git will notice that I stayed away from rebase in this
whole discussion. Rebase and its merge companion cherry-pick would
have let us choose to keep some commits from history (or some commits
from a merge) and skip others. They also allow editing commits way back
in time, or editing commits as they’re merged in.
However, I also recognize that most teams using Git have that one person
who gets into the details of the tool and learns the magic. On some teams,
with some tools, I’ve been that person. I also know that most people are not
that person, and it seems silly to me to pretend that someone should learn
how to rebase or cherry-pick to use Git.
Chapter 11
Flow
Tools
I mentioned at the beginning that I wouldn’t talk about setting up shared
repositories on a server, because that kind of thing is best handled in a
GitHub-like tool. (There are many examples: GitHub itself, Gitlab,
Gitorious, Atlassian Stash.) Any of those tools can be used on the Internet
or installed on a local server. They provide user management and repository
management. They allow authorized users to push to and pull from shared
repositories, typically using SSH (ideally using key-based authentication) or
HTTP/S.
So they take away a lot of the pain of setting up a server to host Git. But
they also enable a Git workflow with feature branches, in a number of
important ways:
They provide a way to list feature branches, show activity, and inspect
the content of the repository.
They notice when a new feature branch is pushed to the server, and
offer to create a “pull request”.
They provide a way to review the code in the feature branch and make
comments.
They notice subsequent updates to feature branches and fold them into
existing pull requests.
They allow someone to accept the pull request, automatically folding
in the feature branch. (This also enforces the principle that feature
branches should merge cleanly.)
They integrate with build tools so feature branches can be built and
tested before their changes are integrated.
Workflow
So once you’ve picked a tool and installed it, the other big question is, how
will your team use Git as part of your workflow? Does everything have to
be on a feature branch, or is it OK to commit to master for hotfixes? Will
your team have to maintain branches for older versions and backport high
priority fixes? Do you need an intermediate branch for your next “unstable”
version, as described in Git Flow?
Each team will identify its own answer to these questions, as well as
decisions like what backlog / ticket system will be used to manage work,
and whether it will be used for all work or only for problem reporting.
Conclusion
There are tradeoffs in deciding to use Git as opposed to any other version
control tool. There may be a learning curve for some members of the team,
or a need to convince leadership of the value of introducing yet another
tool.