2 Preparing copy
2.5 Technical issues for the copy-editor and proofreader
A note about Unicode
Unicode is an international encoding system by which each letter, digit, and symbol is assigned a unique numeric value that applies across different platforms and programs. Each character is assigned a code point and a character name (conventionally written in small caps), e.g. capital A is U+0041 latin capital letter a and a degree symbol is 00B0 degree sign. The Unicode standard contains over 100,000 characters (with capacity for over a million), and is intended to cover most writing systems worldwide. Unicode-compliant fonts are now widely used; none contains all of the characters but the most common accents and symbols that were once a problem to transfer from one type of hardware or software to another can now be successfully reproduced.
There are several input methods for inserting non-keyboard characters into a document, including character mapping applets, or certain keyboard combinations that involve the Alt key and the hexadecimal code point, or the decimal equivalent. For example, in a Windows application, type the hexadecimal code point after a space, then press Alt and X together, and release the keys to insert the character. In HTML special sorts are inserted with an entity tag, such as for a nonbreaking space. Use your preferred input method but strive to use the correct Unicode character, especially for special sorts, as many look very similar, for example the Greek lower-case β and the German eszett ß. Code points and character names for various symbols and spaces are therefore used in Hart’s to encourage good practice (see
The height of type and of vertical spaces is measured in points. A pica is the standard unit of typographic measurement, equal to 12 points; the pica is used particularly to express the total amount of space a text will require, and the text measure (or width of a full line of type) is usually defined in picas.
Vertical or interlinear spacing within the body of the text is called leading (from the strips of lead formerly inserted between lines of type). The amount of leading affects readability—text that is ‘set solid’ (that is, without interlinear space) can be tiring to the eye, but too much leading also interferes with ease of reading, and of course takes up more space. The term ‘leading’ is sometimes also used to refer to the distance from the bottom of one line of type to the bottom of the next, but to avoid confusion it is better to refer to this as the linefeed.
The description of the type size is sometimes found on a title page verso. ‘11 on 12 point’ or ‘11/12 point’, for example, indicates that the lines of type are 12 points apart and the text is set in 11 point; in this case, therefore, there is leading of 1 point between the lines of type.
Copy can be arranged within the text area in one of four ways: ranged left (or flush left) so that the left-hand side is aligned but the right is uneven (ragged right or unjustified); ranged right so that the right-hand side is aligned and the left ragged; justified so that both left- and right-hand sides are aligned to the limits of the ‘text measure’; or centred so that each line is balanced on the midpoint of the text measure.
Justified copy is produced by evenly varying the spaces between words on each line; spaces that are not permitted to vary in this way are called ‘fixed’ spaces (such as that between a note cue and the first word of the note). Ragged and centred text has invariable word spacing, as does poetry. Justified text characteristically employs line-end hyphenation to avoid excessive word spacing; ragged text characteristically does not employ it, or uses it to only a limited extent.
An em is a unit for measuring the width of printed matter, originally reckoned as the width of a capital roman M, but in digital fonts equal to the current typesize, so an em in 10 point text is 10 points wide. An em space is indicated in hard-copy markup by the symbol □. The Unicode code point is U+2003. An en space (U+2002) is a unit of horizontal space equal to half an em. For em rules and en rules see
A thin space (U+2009) is a fifth (sometimes a sixth) of an em space and is usually indicated in hard-copy markup by the symbol ⫯. A hair space (U+200A) is a very thin space, thinner (sometimes by half) than a thin space. Both are used in contexts where the visual relationship of characters requires some spacing but not full word spacing (for example, where two punctuation marks follow each other). Both are fixed or invariable, and both generally behave as nonbreaking (that is, the matter preceding and following them cannot be separated at a line ending but is treated as a single character). A popular alternative to the thin space is the character no-break space (U+00A0).
A single word space is used after all sentence punctuation (not a double space, as was conventional in typewritten text). TeX (specialist software for composing technical material such as mathematics; see
The first line of a paragraph after a heading, epigraph, or section break should be set full out to the left, and subsequent paragraphs indented. For work that is to be typeset or converted to XML, it is preferable to apply a word-processor style with a built-in indent rather than keying spaces or tabs. Alternatively some styles prefer paragraphs separated by a space with the first line of each new paragraph set full out to the left. This is most commonly found on websites, in reports and some kinds of reference work (such as this one).
2.5.2 General principles for typeset matter
Besides checking the accuracy of the typesetter’s work, the proofreader is charged with ensuring that the page is presented so as to be easy to read and pleasing to the eye. To this end some generally accepted rules have become established, though the extent to which they are adhered to depends on circumstances and design considerations.
Traditionally, printers ensured that the last line of a paragraph did not consist of a single syllable, or numerals alone, or a word of fewer than five characters. This rule is no longer followed strictly, but others controlling the position on the page of short lines are still usually observed. The last line of a paragraph should not fall at the top of a new page or column: this is known as a widow. An orphan—the first line of a paragraph that falls at the bottom of a page or column—is undesirable, though it is now tolerated in most bookwork.
No more than two successive lines should begin or end with the same word. Although practice is now less carefully controlled than formerly, many publishers set a limit to the number of lines in succession that may end with hyphens (typically, not more than three or four). The last line on a recto page should not end with a hyphen. Columns, lists, etc. should ideally not be split; if they are split the break should be in as unobtrusive a place as possible. For information on word division see
Pages should all be the same depth, so that matter aligns across the head and foot of the spread. The same provision applies to columns in multi-column setting. If absolutely necessary to avoid awkward page breaks, facing pages may be made a line short or long. Complete pages of material set in a type size different from that of the main text (appendices, notes, etc.) should be made up to the depth of the text page, to the nearest line.
Interlinear spacing between the lines of type should be uniform in normal texts. Where non-text items occur, extra space (or ‘style space’) is left between the illustration, figure, or table and the surrounding text. In the interests of preserving a constant page depth or avoiding awkward page breaks, style space may be slightly reduced or increased at need. Complex texts such as dictionaries may achieve equal page depth by varying the leading in adjacent columns of pages. See also