Working Draft:
The Printer Working Group Standard for Preferred Character Repertoires in Printers

Elliott Bradshaw, 2/28/03

Abstract

In traditional printing environments, clients rely on font downloads when they are not sure a given character is embedded in the printer. As printing moves to small clients, downloading may not be an option and clients have a need to know what characters are available in a given device.

There are many published named character repertoires, and a small client will not know about them all.  For interoperability, this document defines a small set of character repertoires as "preferred," so that a complying client and printer can interoperate with only knowledge of those repertoires.  It also defines a naming convention so that a printer may advertise support for these and other named repertoires.

The primary target of this document is printing using languages based on XML or HTML (for example, XHTML-Print).  It will be less applicable to traditional PDLs (PCL, PostScript, etc.) because they tend to have very language-specific mechanisms for managing character repertoires.

To Do: Front Matter

Open Issues for discussion are in yellow background with a border, like this.

Terminology

In Unicode and W3C documents, the term "character set" usually refers to a method of encoding a (possibly very large) set of characters, e.g. UTF-8. This tells how to encode a given character if it is present, but doesn't define which characters in that space are actually in use.

The term "character repertoire" is used here to indicate a subset of characters that is actually present. It is convenient to specify a character repertoire using Unicode characters;  however in principle a character repertoire could be encoded in a different encoding.

Repertoire Naming

This specification derives repertoire names from several sources.  To avoid ambiguity, the PWG name for a repertoire indicates the source.

The categories of naming are:

Source Examples
Unicode chart Unicode Latin 1
Unicode Cyrillic
Unicode Unihan database Unihan JIS X 0208
Unihan KPS 10721-2000
IANA charset registry Charset iso-8859-1
Vendor extension Vendor Oak Floral

Unicode charts are as described in:

    http://www.unicode.org/charts

ISSUE: Unicode uses the code chart names for presentation, but may not intend these names to be normative.  Yet, their meaning and use are clear on unicode.org.  OK to refer to them in this way?

Unicode Unihan database mapped character set names are as described in:

    http://www.unicode.org/charts/unihan.html

IANA charset names are as described in:

    http://www.iana.org/assignments/character-sets

Note that IANA charsets are in a variety of encodings, not necessarily Unicode.  If a non-Unicode repertoire is used in a Unicode context, the implication is that the corresponding Unicode codepoints are used.  Mappings are available for most IANA charsets, but this is outside the scope of this document.

In matching names, the client should consider these rules:

  1. Names are case-insensitive, so a letter  should match its upper/lower case equivalent

  2. Space, hyphen, and underscore characters are interchangeable

Individual transport protocols may place further restrictions on the use of upper/lower case, and the use of space, hyphen, and underscore characters.

Preferred Character Repertoires

The following are the PWG Preferred Repertoires:

ISSUE: Alternatively, we could prefer non-Asian repertoire names based on ISO-8859 rather than Unicode.

The last four support PRC, Japan, Korea, and Taiwan respectively.

A conforming printer must follow these rules:

  1. A preferred repertoire must be supported whenever others in the same script are.  For example, if any Japanese repertoires are supported, then Unihan JIS X 0208 must also be supported.
  2. All printers must support Unicode Basic Latin.
  3. If a printer supports a repertoire, it must be able to render all characters from the repertoire, in all fonts.  However, the rendering need not be different for different fonts.  For example, in many cases a symbol character will appear the same in all fonts.

As a result, a client may get good results with only knowledge of the preferred repertoires.

There is no requirement that every supported character is represented in some repertoire;  a printer may support specific characters without advertising them.  In some languages (e.g. those based on XHTML) certain characters are implicitly supported (e.g. as built-in character entities), without being advertised in any repertoire.

Print Client Operation

Printing protocols (outside of this document) specify how a print client learns about the supported repertoires in a printer.  Once it knows, a client may choose to use this knowledge in any of these ways:

  1. If multiple printers are available, look for one that can print all the characters in a job.
  2. If printing to a printer that can't print all the characters in a job, warn the user.
  3. Make a substitution for a character that won't print.

To Do: End Matter

Registration Procedures for Additional Names

Internationalization Considerations

Security Considerations

References

Author's Address

Additional Contributors