SZNS/ISO 10646-1:2000 Information technology – Universal Multiple-Octet Coded Character Set (UCS)

E1,115.65

Scope
SZNS ISO IEC 10646-1 specifies the Universal Multiple-Octet Coded Character Set (UCS). It is applicable to the representation, transmission, interchange, processing, storage, input, and presentation of the written form of the languages of the world as well as of additional symbols. This part of SZNS ISO IEC 10646-1 specifies the overall architecture.

Clear

Description

Scope
SZNS ISO IEC 10646-1 specifies the Universal Multiple-Octet Coded Character Set (UCS). It is applicable to the representation, transmission, interchange, processing, storage, input, and presentation of the written form of the languages of the world as well as of additional symbols.
This part of SZNS ISO IEC 10646-1 specifies the overall architecture, and:
– Defines terms used in SZNS ISO IEC 10646-1;
– Describes the general structure of the coded character set;
– Specifies the Basic Multilingual Plane (BMP) of the UCS, and defines a set of graphic characters used in scripts and the written form of languages on a worldwide scale;
– Specifies the names for the graphic characters of the BMP, and their coded representations;
– Specifies the four-octet (32-bit) canonical form of the UCS: UCS-4;
– Specifies a two-octet (16-bit) BMP form of the UCS: UCS-2;
– specifies the coded representations for control functions;
– specifies the management of future additions to this coded character set.
The UCS is a coding system different from that specified in ISO/IEC 2022. The method to designate UCS from ISO/IEC 2022 is specified in 16.2.
NOTE 1. – The Unicode Standard, Version 3.0, provides a set of characters, names, and coded representations that are identical with those in Part 1 of this International Standard. It additionally provides details of character properties, processing algorithms, and definitions that are useful to implementors.
NOTE 2. – It is intended that character code positions for additional scripts and symbols will be allocated in this Part 1 of this International Standard when sufficient input and review implementation level, the adopted subset (by means of a list of collections and/or characters), and the selection of control functions adopted in accordance with clause 15.
(a)Device description: A device that conforms to SZNS ISO IEC 10646-1 shall be the subject of a description that identifies the means by which the user may supply characters to the device and/or may recognize them when they are made available to the user, as specified respectively, in sub-clauses b), and c) below.
b) Originating device: An originating device shall allow its user to supply any characters from an adopted subset, and be capable of transmitting their coded representations within a CC-data-element in accordance with the adopted form and implementation level.
c) Receiving device: A receiving device shall be capable of receiving and interpreting any coded representation of characters that are within a CCdata- element in accordance with the adopted form and implementation level, and shall make any corresponding characters from the adopted subset available to the user in such a way that the user can identify them.
Any corresponding characters that are not within the adopted subset shall be indicated to the user. The way used for indicating them need not distinguish them from each other.
NOTE. 1 – An indication to the user may consist of making available the same character to represent all characters not in the adopted subset, or providing a distinctive audible or visible signal when appropriate to the type of user.
NOTE. 2 – See also annex J for receiving devices with retransmission capability.

TABLE OF CONTENTS

Content Page
1 Scope 1
2 Conformance 1
3 Normative References 2
4 Definitions 2
5 General structure of UCS 4
6 Basic structure and Nomeclanture 4
7 General requirements for the UCS 8
8 The basic Multiligual plane 8
9 Other planes 8
10 Private use groups, planes and zones 8
11 Revision and updating of the UCS 9
12 Subsets 9
13 Coded representation forms of the UCS 9
14 Implementation levels 9
15 Use of control functions with the UCS 10
16 Declaration of identification of features 10
17 Structure of the code tables and lists 11
18 Block names 12
19 Characters in bi-directional context 12
20 Special characters 12
21 Presentation forms of characters 13
22 Compatibility characters 13
23 Order of characters 13
24 Combining characters 13
25 Special features of individual scripts 14
26 Code tables and lists of character names 15
27 CJK unified ideographs 304
Annex A Collections of graphic characters for subsets 879
Annex B List of combining characters 885
Annex C Transformation format for 16 planes of Group 00 (UTF-16) 890
Annex D UCS Transformation Format 8 (UTF-8) 893
Annex E Mirrored characters in Arabic bi-directional context 897
Annex F Alternate format characters 899
Annex G Alphabetically sorted list of character names 904
Annex H The use of “signatures” to identify UCS 951
Annex I Recommendation for combined receiving/originating devices

with internal storage

952
Annex J Notations of octet value representations 953
Annex K Character naming guidelines 954
Annex L Sources of characters 956
Annex M External references to character repertoires 959
Annex N Additional information on characters 961
Annex O Code mapping table for Hangul syllables 964
Annex P Names of Hangul syllables 974
Annex QProcedure for the unification and arrangement of CJK ideographs 897
Annex R Alternate format characters 899
Annex S Alternate format characters 889

Additional information

Format

PDF, Hardcopy