SZNS ISO IEC 10646-1 specifies the Universal Multiple-Octet Coded Character Set (UCS). It is applicable to the representation, transmission, interchange, processing, storage, input, and presentation of the written form of the languages of the world as well as of additional symbols.
This part of SZNS ISO IEC 10646-1 specifies the overall architecture, and:
– Defines terms used in SZNS ISO IEC 10646-1;
– Describes the general structure of the coded character set;
– Specifies the Basic Multilingual Plane (BMP) of the UCS, and defines a set of graphic characters used in scripts and the written form of languages on a worldwide scale;
– Specifies the names for the graphic characters of the BMP, and their coded representations;
– Specifies the four-octet (32-bit) canonical form of the UCS: UCS-4;
– Specifies a two-octet (16-bit) BMP form of the UCS: UCS-2;
– specifies the coded representations for control functions;
– specifies the management of future additions to this coded character set.
The UCS is a coding system different from that specified in ISO/IEC 2022. The method to designate UCS from ISO/IEC 2022 is specified in 16.2.
NOTE 1. – The Unicode Standard, Version 3.0, provides a set of characters, names, and coded representations that are identical with those in Part 1 of this International Standard. It additionally provides details of character properties, processing algorithms, and definitions that are useful to implementors.
NOTE 2. – It is intended that character code positions for additional scripts and symbols will be allocated in this Part 1 of this International Standard when sufficient input and review implementation level, the adopted subset (by means of a list of collections and/or characters), and the selection of control functions adopted in accordance with clause 15.
(a)Device description: A device that conforms to SZNS ISO IEC 10646-1 shall be the subject of a description that identifies the means by which the user may supply characters to the device and/or may recognize them when they are made available to the user, as specified respectively, in sub-clauses b), and c) below.
b) Originating device: An originating device shall allow its user to supply any characters from an adopted subset, and be capable of transmitting their coded representations within a CC-data-element in accordance with the adopted form and implementation level.
c) Receiving device: A receiving device shall be capable of receiving and interpreting any coded representation of characters that are within a CCdata- element in accordance with the adopted form and implementation level, and shall make any corresponding characters from the adopted subset available to the user in such a way that the user can identify them.
Any corresponding characters that are not within the adopted subset shall be indicated to the user. The way used for indicating them need not distinguish them from each other.
NOTE. 1 – An indication to the user may consist of making available the same character to represent all characters not in the adopted subset, or providing a distinctive audible or visible signal when appropriate to the type of user.
NOTE. 2 – See also annex J for receiving devices with retransmission capability.
TABLE OF CONTENTS
|3 Normative References||2|
|5 General structure of UCS||4|
|6 Basic structure and Nomeclanture||4|
|7 General requirements for the UCS||8|
|8 The basic Multiligual plane||8|
|9 Other planes||8|
|10 Private use groups, planes and zones||8|
|11 Revision and updating of the UCS||9|
|13 Coded representation forms of the UCS||9|
|14 Implementation levels||9|
|15 Use of control functions with the UCS||10|
|16 Declaration of identification of features||10|
|17 Structure of the code tables and lists||11|
|18 Block names||12|
|19 Characters in bi-directional context||12|
|20 Special characters||12|
|21 Presentation forms of characters||13|
|22 Compatibility characters||13|
|23 Order of characters||13|
|24 Combining characters||13|
|25 Special features of individual scripts||14|
|26 Code tables and lists of character names||15|
|27 CJK unified ideographs||304|
|Annex A Collections of graphic characters for subsets||879|
|Annex B List of combining characters||885|
|Annex C Transformation format for 16 planes of Group 00 (UTF-16)||890|
|Annex D UCS Transformation Format 8 (UTF-8)||893|
|Annex E Mirrored characters in Arabic bi-directional context||897|
|Annex F Alternate format characters||899|
|Annex G Alphabetically sorted list of character names||904|
|Annex H The use of “signatures” to identify UCS||951|
|Annex I Recommendation for combined receiving/originating devices
with internal storage
|Annex J Notations of octet value representations||953|
|Annex K Character naming guidelines||954|
|Annex L Sources of characters||956|
|Annex M External references to character repertoires||959|
|Annex N Additional information on characters||961|
|Annex O Code mapping table for Hangul syllables||964|
|Annex P Names of Hangul syllables||974|
|Annex QProcedure for the unification and arrangement of CJK ideographs||897|
|Annex R Alternate format characters||899|
|Annex S Alternate format characters||889|