I analyzed first sentences and first pages from 566 common-domain and free novels to better understand what makes a good book beginning. From a publishing/programming standpoint, we’ll call a page 250 words. Clearly, math tools won’t write the perfect opening for you, but it can, like spell-checking your resume, prevent readers from dismissing you before getting to the end. I don’t know what ideal is, but I can look at the bell curve of hundreds of attempts and tell you what the bottom 5th percentile looks like.
FIRST SENTENCE
The first sentence should ideally grab your reader and
demand attention. I’m not talking about using an exclamation point. The best
ones set the tone and character without the reader realizing how much you’ve
communicated. Often, they state a theme that the book is going to prove. I only
saw one “it was a dark and stormy night.” I wish I could give good and bad
examples below, but copyright restrictions prevent that. I can relay
observations. The average grade level is 7.4, although applying the Gunning-Fog
reading level to just one sentence isn’t reliable, so we’ll need other
indicators.
One thing you want to avoid is proper-noun soup where you
bombard the reader with long, foreign words until they give up. The samples averaged
1.3 names per opening sentence, with a record of 9 names. Statistically, anything
over 3 is excessive. If you have a military space opera, though, you can safely
make it four because those readers are accustomed to ranks being part of the
name.
How long is an average first sentence? However long it
takes. I’ve seen them everywhere from a one-word expletive to 112 for “Le Mis”.
However, if I cut out the few ancient tomes over the 100-word level, we’re left
with a pretty consistent average of 17 words, plus or minus a 10-word standard
deviation. Thus, if you take more than 32 words to grab your reader, it’s
probably too much. For hard words (3+syllables), they averaged only one
occurrence, with that usually being part of a name or an adverb. Having over 3 hard
words should be a red flag unless you’re writing a medical thriller. What about
commas, another sign of complexity? Of sentences I scanned, exactly half had
any. Use one if you need it. The 8 eight percent with more than two are
probably risking their audience, and the guy with seven is daring them to
leave.
What should this opening sentence be composed of? First, I
will examine the verbs used. The most common, by far, were forms of IS. Note
the exponential curve, where SAID appears half as often. The remainder of the
top achievers were all action (total 33%+), sensory (8%), or recall (5%) verbs.
These all make sense because they set the tone/mood for your scene and pull the
reader in—except begin/start, which stood out as weak and could have been eliminated
to make a better opening. People tend to die, fall, or awaken much more in the
first line for the sake of drama. They also stare rather than look.
Verb
|
% line ones
|
% first
page
|
is
|
11.9
|
9.1
|
said
|
5.3
|
6.0
|
know
|
2.6
|
2.6
|
come
|
2.3
|
1.9
|
stand
|
2.1
|
0.9
|
begin
|
1.8
|
1.0
|
sit
|
1.8
|
0.8
|
see
|
1.8
|
2.0
|
go
|
1.4
|
1.2
|
make
|
1.4
|
1.6
|
take
|
1.4
|
1.3
|
die
|
1.2
|
0.4
|
stare
|
1.1
|
0.3
|
think
|
1.1
|
1.2
|
wake
|
1.1
|
0.3
|
fall
|
0.9
|
0.5
|
hear
|
0.9
|
0.8
|
glance
|
0.8
|
0.3
|
look
|
0.8
|
1.5
|
remember
|
0.8
|
0.3
|
believe
|
0.6
|
0.6
|
check
|
0.6
|
0.3
|
feel
|
0.6
|
0.9
|
move
|
0.6
|
0.3
|
wait
|
0.6
|
0.4
|
watch
|
0.6
|
0.4
|
What also speaks volumes are the five common words in the
rest of the novel that we never see in the first line: get, seem, keep, try,
and happen. I’ve found that I can replace almost all instances of the overused
word “get” in my writing with stronger/more specific ones. I suspect the same
may be true of some of the other passive ones.
Did any sentence pattern establish itself as dominant? Not
really. Not even all the sentences were complete. In the span of an entire
novel, I see few patterns occur more than .5 percent of the time. The most
common are usually:
SV SVAN SVN SVJN
For opening lines, I saw these in lower concentrations plus
few others, mainly with AJN instead of S and a wide variety of prepositional
phrases. The range is so extreme and sparse that I could make no further generalizations.
SVRSVANPAN AJNVJ
AJNVJPANPNPO JNVJN SVJPN
SVPJN SVPAN SVPN
The next thing to examine is the difference between the
parts of speech in the first sentence when compared with all the others in a novel:
no profanity or interjections to speak of, fewer pronouns, contractions, verbs,
and objects. When you think about it, proper nouns have to be used before the
pronouns or objects that represent them. Contractions should only be used in
dialog, so those should occur less often. The increased prepositions tended to be mostly
“of” or “in.” Adjective counts fluctuated based on style and genre, but they would
remain fairly consistent throughout a given novel. Openings use more
prepositions, articles, and proper names to compensate for the missing parts. (see
table)
Part Of Speech
|
% First
Sentence
|
% Other
Sentences
|
First
Word %
by Type
|
Prepositions
|
14.5
|
10.7-11.9
|
5.5
|
Articles/his/her
|
14.7
|
10-11
|
22.7
|
Nouns
|
13.5
|
13.1
|
7.3
|
Ambiguous(noun or verb)
|
13.5
|
14.3
|
5.2
|
Adjectives
|
8.9
|
7.0-8.7
|
6.1
|
Pastp verbs
|
5.1
|
5.2
|
0.9
|
Proper nouns
|
5.0
|
2.9
|
19.5
|
Adverbs
|
3.2
|
3.4
|
3.2
|
Subject (he/she/it/you)
|
3.1
|
6.1-8.5
|
17.8
|
Gerund
|
2.9
|
2.4
|
0
|
Is verb
|
2.7
|
2.2
|
0
|
Clause
|
2.5
|
3.6-4.3
|
4.1
|
Other verbs
|
2.5
|
3.1-3.5
|
0.6
|
Conjunctions
|
2.35
|
2.4
|
0.9
|
Help verbs
|
1.9
|
3.5
|
0
|
Contractions (N plus V)
|
1.2
|
2.0
|
0
|
Objects
|
0.8
|
1.3
|
0
|
Said
|
0.7
|
1.8-2.3
|
0
|
Interjection
|
0.0
|
0.2
|
0
|
Zeroing in on the very first word, we can see that two-thirds
of the opening sentences start with a strong article, proper noun, a subject, or an
adjective. A good choice of subject is usually I or we. The word “it” leads to
meandering passive voice, but this may be the Victorian tone the author is
trying to set. The use of “he” or “she” as the opener immediately begs the
question for the reader—who the heck are you talking about? Even when the title
of the chapter explains who the author is referring to, having to go back and
deduce the information hacks me off. Beginning with a verb, conjunction,
gerund, or interjection is not normal unless it is a past participle used as an
adjective or a command such as a forceful “don’t” inside dialog. I would go so
far as to say that spending your first word on a conjunction is a complete
waste, as is beginning with the vague adverb “there,” or a padding word like
“actually.”
Why is this important past the first few seconds of reading?
Well, the rules for clarity and creativity for the first sentence apply to every chapter and scene break after that.
When readers put down and restart your book, it will likely be at one of these
breakpoints. I can’t tell you the number of times I’ve stopped an Indie novel
because they began every scene with the name of the main character. Boring!
It’s like flipping a coin that always turns up heads. By the third time in a
row, you’re going to know something is wrong. Only it’s worse for proper names
because the odds for repeating them three times at random would be one in 400.
With hundreds of scenes in a novel, it can happen, but it shouldn’t be the default.
FIRST PAGE
By contrast to the opening line, the first page smoothes out
to an average reading level of only 6.1 (down 1.3 grades). The average sentence
length is also 2.1 words shorter than the opening sentence. This tells me that many books overextend a little on the first sentence
trying to shove all the info-dumps in. This happens for Indie as much as
traditional publishers. About all I can say with certainty is that if your
first page is above grade 9.5, you should simplify it.
What lessons can we apply from what we learned from the
first line? If you start three paragraphs in a row with the same word or type
of lead-in (name, article, gerund, or relative clause), people will notice,
especially since paragraph beginnings stand out on the first page.
You should avoid introducing too many characters right off
the bat. When you do, give them unique names that don’t look or sound alike, so
we know who is who. How many new character/place names is too many on the first
page? The average was 8 +- 6 unique
names. If you have over 17 different uppercase names on page one (not counting
ranks and titles), think hard about trimming. My personal record was 24, where
two people were discussing Dwarves (which I capitalized to denote the race) and
their favorite Sean Connery movies. So these rules of thumb have
exceptions.
How long should we wait before starting dialog? The graphs were
bimodal on this one, with 12 percent of them jumping in on the first sentence.
The rest of the books waited an average of 190 (+-40) words, setting the scene
carefully before anyone speaks. Thus, unless your main character is stranded on
a desert island, you should have some sort of dialog before the top of page two.
But some of those quotes I spotted were air quotes or nicknames. To compensate,
I tracked how far to the first flowing
dialog, where one quote ends and another begins with no tags in between. One-third
of my samples never achieved this feat! Now, most of these were due to the
samples being short stories or only 20 percent of a book, but several were because
newbies hadn’t mastered the technique. I came to the conclusion that if the
author didn’t have flowing dialogue by the 30K word mark, it probably had occurred
by accident. Without these outliers, the average distance to flowing dialog was
1051 words +-1122, somewhere between pages 1 and 10. The threshold for starting
too late is around page 15 (3750 words). I may use that as one of my metrics
for whether to buy an e-book from the sample.