User:HarJIT/sandbox/Halfwidth-fullwidth typography

From Wikipedia, the free encyclopedia
A command prompt (cmd.exe) with Korean Localisation showing halfwidth and fullwidth characters

In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. With duospaced fonts, a halfwidth character occupies half the width of a fullwidth character, hence the name.

Rationale[edit]

Comparison of monospaced and duospaced display.

In the days of text mode computing, Western characters were normally laid out in a grid on the screen, often 80 columns by 24 or 25 lines. Each character was displayed as a small dot matrix, often about 8 pixels wide in a monospaced font, and a SBCS (single byte character set) was generally used to encode characters of western languages.

For a number of practical and aesthetic reasons Han characters need to be square, approximately twice as wide as these fixed-width SBCS characters. Accordingly, on fixed-width displays, duospaced fonts (or duospace fonts) were used, in which letters and characters can occupy either a single or double amount of a specified, fixed horizontal space.[1] As these were typically encoded in a DBCS (double byte character set) this also meant that their length on screen in a duospaced font was proportional to their byte length.

Some terminals and editing programs could not deal with double-byte characters starting at odd columns, only even ones (some could not even put double-byte and single-byte characters in the same line). So the DBCS sets generally included Roman characters and digits also, for use alongside the CJK characters in the same line.

On the other hand, early Japanese computing used a single-byte code page called JIS X 0201 for katakana. These would be rendered at the same width as the other single-byte characters, making them half-width kana characters rather than normally proportioned kana. Although the JIS X 0201 standard itself did not specify half-width display for katakana, this became the visually distinguishing feature in Shift JIS between the single-byte JIS X 0201 and double-byte JIS X 0208 katakana. Some IBM code pages used a similar treatment for Korean jamo, based on the N-byte Hangul code and its EBCDIC translation.

Although associated with Asian character sets, the general notion of duospaced fonts is not limited to such characters. Examples of duospaced characters not strictly associated with Asian halfwidth and fullwidth forms include various technical and pictographic symbols as seen in Migu 2M, and the Unicode character Roman Numeral One Hundred Thousand (U+2188) and various other symbols in GNU Unifont.

What a code actually encodes[edit]

An individual code in a legacy DBCS, such as Big5, does not always represent a complete semantic unit. Those of logograms are always logograms, but codes for symbols, punctuation or special characters are not always complete graphical characters. What is encoded are particular graphical representations of characters or part of characters that happen to fit in the space taken by two monospaced ASCII characters. This is a property of double-byte character sets as normally used in CJK (Chinese, Japanese, and Korean) computing, and is not a unique problem of Big5.

(The above might need some explanation by putting it in historical perspective, as it is theoretically incorrect: Back when text mode personal computing was still the norm, characters were normally represented as single bytes and each character takes one position on the screen. There was therefore a practical reason to insist that double-byte characters must take up two positions on the screen, namely that off-the-shelf, American-made software would then be usable without modification in a DBCS-based system. If a character can take an arbitrary number of screen positions, software that assumes that one byte of text takes one screen position would produce incorrect output. Of course, if a computer never had to deal with the text screen, the manufacturer would not enforce this artificial restriction; the Apple Macintosh is an example. Nevertheless, the encoding itself must be designed so that it works correctly on text-screen-based systems.)

To illustrate this point, consider the Big5 code 0xa14b (…). To English speakers this looks like an ellipsis and the Unicode standard identifies it as such; however, in Chinese, the ellipsis consists of six dots that fit in the space of two Chinese characters (……), so in fact there is no Big5 code for the Chinese ellipsis, and the Big5 code 0xa14b just represents half of a Chinese ellipsis. It represents only half of an ellipsis because the whole ellipsis should take the space of two Chinese characters, and in many DBCS systems one DBCS character must take exactly the space of one Chinese character.

Characters do not always represent things that can be readily used in plain text files; an example is "citation mark" (﹋, Big5: 0xa1ca), which is, when used, required to be typeset under the title of literary works. Another example is the Suzhou numerals, which is a form of scientific notation that requires the number to be laid out in a 2-D form consisting of at least two rows.

Half-width kana[edit]

Half-width kana (半角カナ, Hankaku kana) are katakana characters displayed at half their normal width (a 1:2 aspect ratio), instead of the usual square (1:1) aspect ratio. For example, the usual (full-width) form of the katakana ka is カ while the half-width form is カ. Half-width hiragana is not usable within Unicode, although it's usable on Web or E-books via CSS's font-feature-settings: "hwid" 1 with Adobe-Japan1-6 based OpenType fonts.[2] Half-width kanji is not usable on modern computers even though it's used in some receipt printers, electric bulletin board or old computers.[3]

Half-width kana were used in the early days of Japanese computing, to allow Japanese characters to be displayed on the same grid as monospaced fonts of Latin characters. Half-width kanji were not used. Half-width kana characters are not generally used today, but find some use in specific settings, such as cash register displays, on shop receipts, Japanese digital television and DVD subtitles, and mailing address labels. Their usage is sometimes also a stylistic choice, particularly frequent in certain Internet slang.

The term "half-width kana", which strictly refers only to how kana are displayed, not how they are stored – is also used loosely to refer to the A0–DF (hexadecimal) block where katakana are stored in some character encodings, such as JIS X 0201 (1969) – see encodings, below. This is formally incorrect, however – this JIS standard simply specifies that katakana be stored in these locations, without specifying how they should be displayed; the confusion is because in early computing, the characters stored here were in fact displayed as half-width kana.

In OpenType[edit]

OpenType has the fwid, halt, hwid and vhal "feature tags" to be used for providing fullwidth or halfwidth form of a character.

In Unicode[edit]

In Unicode, if a certain grapheme can be represented as either a fullwidth character or a halfwidth character, it is said to have both a fullwidth form and a halfwidth form. Unicode allocates every character an "East Asian width" property. This may be:[4]

Abbreviation Name Description
F Fullwidth Wide variant with compatibility normalisation to naturally narrow character, e.g. fullwidth Latin script.
H Halfwidth Narrow variant with compatibility normalisation to naturally wide character, e.g. half-width kana.
W Wide Naturally wide character, e.g. Hiragana.
Na Narrow Naturally narrow character, e.g. ISO Basic Latin alphabet.
A Ambiguous Characters included in East Asian DBCS codes but also in European SBCS codes, e.g. Greek alphabet. Duospaced behaviour can consequently vary.
N Neutral Characters which do not appear in East Asian DBCS codes, e.g. Devanagari.

Unicode block[edit]

Halfwidth and Fullwidth Forms is also the name of a Unicode block U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode. It is the last of the Basic Multilingual Plane excepting the short Specials block at U+FFF0–FFFF.

Halfwidth and Fullwidth Forms[1][2]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+FF0x
U+FF1x
U+FF2x
U+FF3x _
U+FF4x
U+FF5x
U+FF6x
U+FF7x ソ
U+FF8x
U+FF9x
U+FFAx  HW 
HF
U+FFBx
U+FFCx
U+FFDx
U+FFEx
Notes
1.^ As of Unicode version 15.1
2.^ Grey areas indicate non-assigned code points

References[edit]

  1. ^ "Font spacing characteristics". IBM Knowledge Center. IBM Corporation. 1990. Retrieved 2017-09-17.
  2. ^ 改訂新版スタイルシートポケットリファレンス p.107 (in Japanese), Hajime Fujimoto, March 5, 2013, ISBN 978-4774154862
  3. ^ TSP100futurePRNT (in Japanese), Star Micronics
  4. ^ Lunde, Ken (2019-01-25). "Unicode® Standard Annex #11: East Asian Width". Unicode Consortium.

Category:Japanese writing system terms Category:Typesetting Category:Encodings of Japanese Category:Encodings of Asian languages