summaryrefslogtreecommitdiff
path: root/posts/bookclub-20200511.mdwn
blob: 15f7e9fcadf7f12f0f4ab781dca25439f5bdbf33 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
[[!meta title="The Lars, Mark, and Daniel Club"]]
[[!meta author="Daniel Silverstone"]]
[[!meta date="2020-05-11 20:00:00"]]
[[!tag draft]]

Today [Lars][], [Mark][], and I discussed Jeremy Kun's [The communicative value
of using Git well][article] post.  While a lot of our discussion was spawned by
the article, we did go off-piste a little, and I hope that my notes below will
enlighten you all as to a bit of how we see revision control these days.  It
was remarkably pleasant to read an article where the comments section wasn't a
cesspool of horror, so if this posting encourages you to go and read the
article, don't stop when you reach the bottom -- the comments are good and
useful too.

[Lars]: https://blog.liw.fi/
[Mark]: https://blog.sirena.org.uk/
[article]: https://jeremykun.com/2020/01/14/the-communicative-value-of-using-git-well/

***

This was a fairly non-contentious article for us though each of us had points
we wished to bring up and chat about it turned into a very convivial chat.
We saw the main thrust of the article as being about using the metadata of
revision control to communicate intent, process, and decision making.  We
agreed that it must be possible to do so effectively with Mercurial (thus
deciding that the mention of it was simply a bit of clickbait / red herring)
and indeed Mark figured that he was doing at least some of this kind of thing
way back with CVS.

We all discussed how knowing the fundamentals of Git's data model improved our
ability to work wih the tool.  Lars and I mentioned how jarring it has originally
been to come to Git from revision control systems such as Bazaar (`bzr`) but
how over time we came to appreciate Git for what it is.  For Mark this was less
painful because he came to Git early enough that there was little more than the
fundamental data model, without much of the [porcelain][] which now exists.

[porcelain]: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain

One point which we all, though Mark in particular, felt was worth considering
was that of distinguishing between published and unpublished changes.  The
article touches on it a little, but one of the benefits of the workflow which
Jeremy espouses is that of using the revision control system as an integral
part of the review pipeline.  This is perhaps done very well with Git based
workflows, but can be done with other VCSs.

With respect to the points Jeremy makes regarding making commits which are good
for reviewing, we had a general agreement that things were good and sensible,
to a point, but that some things were missed out on.  For example, I raised
that commit messages often need to be more thorough than one-liners, but
Jeremy's examples (perhaps through expedience for the article?) were all pretty
trite one-liners which perhaps would have been entirely obvious from the commit
content.  Jeremy makes the point that large changes are hard to review, and
Lars poined out that [Greg Wilson did research][gwilson] in this area, and at
least one [article][article200] mentions 200 lines as a maximum size of a
reviewable segment.

[gwilson]: https://arxiv.org/abs/1407.5648
[article200]: https://uwescience.github.io/neuroinformatics/2017/10/08/code-review.html

I had a brief warble at this point about how reviewing needs to be able to
consider the whole of the intended change (i.e. a diff from base to tip) not
just individual commits, which is also missing from Jeremy's article, but that
such a review does not need to necessarily be thorough and detailed since the
commit-by-commit review remains necessary.  I use that high level diff as a way
to get a feel for the full shape of the intended change, a look at the end-game
if you will, before I read the story of how someone got to it.  As an aside at
this point, I talked about how Jeremy included a 'style fixes' commit in his
example, but I **loathe** seeing such things and would much rather it was
either not in the series because it's unrelated to it; or else the style fixes
were folded into the commits they were related to.

We discussed how style rules, as well as commit-bisectability, and other rules
which may exist for a codebase, the adherence to which would form part of the
checks that a code reviewer may perform, are there to be held to when they
**help** the project, and to be _broken_ when they are in the way of good
communication between humans.

In this, Lars talked about how revision control histories provide high level
valuable communication _between developers_.  Communication between humans is
fraught with error and the rules are not always clear on what will work and
what won't, since this depends on the communicators, the context, etc.  However
whatever communication rules are in place should be followed.  We often say
that it takes two people to communicate information, but when you're writing
commit messages or arranging your commit history, the second party is often
nebulous "other" and so the code reviewer fulfils that role to concretise it
for the purpose of communication.

At this point, I wondered a little about what value there might be (if any) in
maintaining the metachanges (pull request info, mailing list discussions, etc)
for historical context of decision making.  Mark suggested that this is useful
for design decisions etc but not for the style/correctness discussions which
often form a large section of review feedback.  Maybe some of the metachange
tracking is done automatically by the review causing the augmentation of the
changes (e.g. by comments, or inclusion of design documentation changes) to
explain _why_ changes are made.

We discussed how the "rebase always vs. rebase never" feeling flip-flopped in
us for many years until, as an example, what finally won Lars over was that he
wants the history of the project to tell the story, in the git commits, of how
the software has changed and evolved in an **intentional** manner.  Lars said
that he doesn't care about the meanderings, but rather about a clear story
which can be followed and understood.

I described this as the switch from "the revision history is about what I did
to achieve the goal" to being more "the revision history is how I would hope
someone else would have done this".  Mark further refined that to "The revision
history of a project tells the story of how the project, as a whole, _chose_ to
perform its sequence of evolution."

We discussed how project history must necessarily then contain issue tracking,
mailing list discussions, wikis, etc.  There are exist free software projects
where part of their history is forever lost because, for example, the project
moved from Sourceforge to Github, but made no effort (or was unable) to migrate
issues or somesuch.  Linkages between changes and the issues they relate to
can easily be broken, though at least with mailing lists you can often rewrite
URLs if you have something consistent like a `Message-Id`.

We talked about how cover notes, pull request messages, etc. can thus also be
lost to some extent.  Is this an argument to always use merges whose message
bodies contain those details, rather than always fast-forwarding?  Or is it a
reason to encapsulate all those discussions into git objects which can be
forever associated with the changes in the DAG?

We then diverted into discussion of CI, testing every commit, and the benefits
and limitations of automated testing vs. manual testing; though I think that's
a little too off-piste for even this summary.  We also talked about how commit
message audiences include software perhaps, with the recent movement toward
[conventional commits][ccommits] and how, with respect to commit messages for
machine readability, it can become very complex/tricky to craft good commit
messages once there are multiple disparate audiences.  For projects the size of
the Linux kernel this kind of thing would be nearly impossible, but for smaller
projects, perhaps there's value.

[ccommits]: https://www.conventionalcommits.org/en/v1.0.0/

Finally, we all agreed that we liked the quote at the end of the article, and so
I'd like to close out by repeating it for you all...

[Hal Abelson][habl] famously said:

> Programs must be written for people to read, and only incidentally for
> machines to execute.

[habl]: https://en.wikipedia.org/wiki/Hal_Abelson

Jeremy agrees, as do we, and extends that to the metacommit information as
well.