Monday, April 27, 2020

Opening Salvos


I analyzed first sentences and first pages from 566 common-domain and free novels to better understand what makes a good book beginning. From a publishing/programming standpoint, we’ll call a page 250 words. Clearly, math tools won’t write the perfect opening for you, but it can, like spell-checking your resume, prevent readers from dismissing you before getting to the end. I don’t know what ideal is, but I can look at the bell curve of hundreds of attempts and tell you what the bottom 5th percentile looks like.

FIRST SENTENCE

The first sentence should ideally grab your reader and demand attention. I’m not talking about using an exclamation point. The best ones set the tone and character without the reader realizing how much you’ve communicated. Often, they state a theme that the book is going to prove. I only saw one “it was a dark and stormy night.” I wish I could give good and bad examples below, but copyright restrictions prevent that. I can relay observations. The average grade level is 7.4, although applying the Gunning-Fog reading level to just one sentence isn’t reliable, so we’ll need other indicators.

One thing you want to avoid is proper-noun soup where you bombard the reader with long, foreign words until they give up. The samples averaged 1.3 names per opening sentence, with a record of 9 names. Statistically, anything over 3 is excessive. If you have a military space opera, though, you can safely make it four because those readers are accustomed to ranks being part of the name.

How long is an average first sentence? However long it takes. I’ve seen them everywhere from a one-word expletive to 112 for “Le Mis”. However, if I cut out the few ancient tomes over the 100-word level, we’re left with a pretty consistent average of 17 words, plus or minus a 10-word standard deviation. Thus, if you take more than 32 words to grab your reader, it’s probably too much. For hard words (3+syllables), they averaged only one occurrence, with that usually being part of a name or an adverb. Having over 3 hard words should be a red flag unless you’re writing a medical thriller. What about commas, another sign of complexity? Of sentences I scanned, exactly half had any. Use one if you need it. The 8 eight percent with more than two are probably risking their audience, and the guy with seven is daring them to leave.

What should this opening sentence be composed of? First, I will examine the verbs used. The most common, by far, were forms of IS. Note the exponential curve, where SAID appears half as often. The remainder of the top achievers were all action (total 33%+), sensory (8%), or recall (5%) verbs. These all make sense because they set the tone/mood for your scene and pull the reader in—except begin/start, which stood out as weak and could have been eliminated to make a better opening. People tend to die, fall, or awaken much more in the first line for the sake of drama. They also stare rather than look.

Verb
% line ones
% first
page
is
11.9
9.1
said
5.3
6.0
know
2.6
2.6
come
2.3
1.9
stand
2.1
0.9
begin
1.8
1.0
sit
1.8
0.8
see
1.8
2.0
go
1.4
1.2
make
1.4
1.6
take
1.4
1.3
die
1.2
0.4
stare
1.1
0.3
think
1.1
1.2
wake
1.1
0.3
fall
0.9
0.5
hear
0.9
0.8
glance
0.8
0.3
look
0.8
1.5
remember
0.8
0.3
believe
0.6
0.6
check
0.6
0.3
feel
0.6
0.9
move
0.6
0.3
wait
0.6
0.4
watch
0.6
0.4

What also speaks volumes are the five common words in the rest of the novel that we never see in the first line: get, seem, keep, try, and happen. I’ve found that I can replace almost all instances of the overused word “get” in my writing with stronger/more specific ones. I suspect the same may be true of some of the other passive ones.

Did any sentence pattern establish itself as dominant? Not really. Not even all the sentences were complete. In the span of an entire novel, I see few patterns occur more than .5 percent of the time. The most common are usually:
SV       SVAN             SVN                SVJN

For opening lines, I saw these in lower concentrations plus few others, mainly with AJN instead of S and a wide variety of prepositional phrases. The range is so extreme and sparse that I could make no further generalizations.
SVRSVANPAN         AJNVJ            AJNVJPANPNPO     JNVJN                        SVJPN
SVPJN                       SVPAN           SVPN

The next thing to examine is the difference between the parts of speech in the first sentence when compared with all the others in a novel: no profanity or interjections to speak of, fewer pronouns, contractions, verbs, and objects. When you think about it, proper nouns have to be used before the pronouns or objects that represent them. Contractions should only be used in dialog, so those should occur less often. The increased prepositions tended to be mostly “of” or “in.” Adjective counts fluctuated based on style and genre, but they would remain fairly consistent throughout a given novel. Openings use more prepositions, articles, and proper names to compensate for the missing parts. (see table)

Part Of Speech
% First
Sentence
% Other
Sentences
First
Word %
by Type
Prepositions
14.5
10.7-11.9
5.5
Articles/his/her
14.7
10-11
22.7
Nouns
13.5
13.1
7.3
Ambiguous(noun or verb)
13.5
14.3
5.2
Adjectives
8.9
7.0-8.7
6.1
Pastp verbs
5.1
5.2
0.9
Proper nouns
5.0
2.9
19.5
Adverbs
3.2
3.4
3.2
Subject (he/she/it/you)
3.1
6.1-8.5
17.8
Gerund
2.9
2.4
0
Is verb
2.7
2.2
0
Clause
2.5
3.6-4.3
4.1
Other verbs
2.5
3.1-3.5
0.6
Conjunctions
2.35
2.4
0.9
Help verbs
1.9
3.5
0
Contractions (N plus V)
1.2
2.0
0
Objects
0.8
1.3
0
Said
0.7
1.8-2.3
0
Interjection
0.0
0.2
0

Zeroing in on the very first word, we can see that two-thirds of the opening sentences start with a strong article, proper noun, a subject, or an adjective. A good choice of subject is usually I or we. The word “it” leads to meandering passive voice, but this may be the Victorian tone the author is trying to set. The use of “he” or “she” as the opener immediately begs the question for the reader—who the heck are you talking about? Even when the title of the chapter explains who the author is referring to, having to go back and deduce the information hacks me off. Beginning with a verb, conjunction, gerund, or interjection is not normal unless it is a past participle used as an adjective or a command such as a forceful “don’t” inside dialog. I would go so far as to say that spending your first word on a conjunction is a complete waste, as is beginning with the vague adverb “there,” or a padding word like “actually.”

Why is this important past the first few seconds of reading? Well, the rules for clarity and creativity for the first sentence apply to every chapter and scene break after that. When readers put down and restart your book, it will likely be at one of these breakpoints. I can’t tell you the number of times I’ve stopped an Indie novel because they began every scene with the name of the main character. Boring! It’s like flipping a coin that always turns up heads. By the third time in a row, you’re going to know something is wrong. Only it’s worse for proper names because the odds for repeating them three times at random would be one in 400. With hundreds of scenes in a novel, it can happen, but it shouldn’t be the default.

FIRST PAGE

By contrast to the opening line, the first page smoothes out to an average reading level of only 6.1 (down 1.3 grades). The average sentence length is also 2.1 words shorter than the opening sentence. This tells me that many books overextend a little on the first sentence trying to shove all the info-dumps in. This happens for Indie as much as traditional publishers. About all I can say with certainty is that if your first page is above grade 9.5, you should simplify it.

What lessons can we apply from what we learned from the first line? If you start three paragraphs in a row with the same word or type of lead-in (name, article, gerund, or relative clause), people will notice, especially since paragraph beginnings stand out on the first page.

You should avoid introducing too many characters right off the bat. When you do, give them unique names that don’t look or sound alike, so we know who is who. How many new character/place names is too many on the first page? The average was 8 +- 6 unique names. If you have over 17 different uppercase names on page one (not counting ranks and titles), think hard about trimming. My personal record was 24, where two people were discussing Dwarves (which I capitalized to denote the race) and their favorite Sean Connery movies. So these rules of thumb have exceptions.

How long should we wait before starting dialog? The graphs were bimodal on this one, with 12 percent of them jumping in on the first sentence. The rest of the books waited an average of 190 (+-40) words, setting the scene carefully before anyone speaks. Thus, unless your main character is stranded on a desert island, you should have some sort of dialog before the top of page two. But some of those quotes I spotted were air quotes or nicknames. To compensate, I tracked how far to the first flowing dialog, where one quote ends and another begins with no tags in between. One-third of my samples never achieved this feat! Now, most of these were due to the samples being short stories or only 20 percent of a book, but several were because newbies hadn’t mastered the technique. I came to the conclusion that if the author didn’t have flowing dialogue by the 30K word mark, it probably had occurred by accident. Without these outliers, the average distance to flowing dialog was 1051 words +-1122, somewhere between pages 1 and 10. The threshold for starting too late is around page 15 (3750 words). I may use that as one of my metrics for whether to buy an e-book from the sample.