Info Node: (texinfo)HTML Xref 8-bit Character Expansion

CFHT HOME texinfo: HTML Xref 8-bit Character Expansion


up: HTML Xref next: HTML Xref Mismatch prev: HTML Xref Command Expansion Back to Software Index

24.4.4 HTML Cross Reference 8-bit Character Expansion
-----------------------------------------------------

Usually, characters other than plain 7-bit ASCII are transformed into
the corresponding Unicode code point(s) in Normalization Form C, which
uses precomposed characters where available.  (This is the normalization
form recommended by the W3C and other bodies.)  This holds when that
code point is '0xffff' or less, as it almost always is.

  These will then be further transformed by the rules above into the
string '_HHHH', where HHHH is the code point in hex.

  For example, combining this rule and the previous section:

     @node @b{A} @TeX{} @u{B} @point{}@enddots{}
     => A-TeX-B_0306-_2605_002e_002e_002e

  Notice: 1) '@enddots' expands to three periods which in turn expands
to three '_002e''s; 2) '@u{B}' is a 'B' with a breve accent, which does
not exist as a pre-accented Unicode character, therefore expands to
'B_0306' (B with combining breve).

  When the Unicode code point is above '0xffff', the transformation is
'__XXXXXX', that is, two leading underscores followed by six hex digits.
Since Unicode has declared that their highest code point is '0x10ffff',
this is sufficient.  (We felt it was better to define this extra escape
than to always use six hex digits, since the first two would nearly
always be zeros.)

  This method works fine if the node name consists mostly of ASCII
characters and contains only few 8-bit ones.  If the document is written
in a language whose script is not based on the Latin alphabet (for
example, Ukrainian), it will create file names consisting entirely of
'_XXXX' notations, which is inconvenient and all but unreadable.

  To handle such cases, 'makeinfo' offers the
'--transliterate-file-names' command line option.  This option enables
"transliteration" of node names into ASCII characters for the purposes
of file name creation and referencing.  The transliteration is based on
phonetic principles, which makes the generated file names more easily
understanable.

  For the definition of Unicode Normalization Form C, see Unicode report
UAX#15, <http://www.unicode.org/reports/tr15/>.  Many related documents
and implementations are available elsewhere on the web.


automatically generated by info2www version 1.2