Length Semantics

Previous
Previous
Next
Next

In single-byte character sets, the number of bytes and the number of characters in a string are the same. In multi-byte character sets, a character or code point consists of one or more bytes. Calculating the number of characters based on byte length can be difficult in a variable-width character set. Calculating column length in bytes is called byte semantics, while measuring column length in characters is called character semantics.

Character semantics is useful to define the storage requirements for multi-byte strings of varying widths. For example, in a Unicode database (AL32UTF8), suppose that you need to define a VARCHAR2 column that can store up to five Chinese characters together with five English characters. Using byte semantics, this column requires 15 bytes for the Chinese characters, which are 3 bytes long, and 5 bytes for the English characters, which are 1 byte long, for a total of 20 bytes. Using character semantics, the column requires 10 characters.

The expressions in the following list use byte semantics. Note the BYTE qualifier in the VARCHAR2 expression and the B suffix in the SQL function name.

The expressions in the following list use character semantics. Note the CHAR qualifier in the VARCHAR2 expression.

This section contains the following topic:

NLS_LENGTH_SEMANTICS Parameter

The NLS_LENGTH_SEMANTICS parameter specifies BYTE (default) or CHAR semantics. By default, the character datatypes CHAR and VARCHAR2 are specified in bytes, not characters. Therefore, the specification CHAR(20) in a table definition allows 20 bytes for storing character data.

The NLS_LENGTH_SEMANTICS parameter enables you to create CHAR, VARCHAR2, and LONG columns using either byte-length or character-length semantics. NCHAR, NVARCHAR2, CLOB, and NCLOB columns are always character-based. Existing columns are not affected.

Example: Setting Length Semantics and Creating a Table shows an example of creating a table. When the database character set is WE8MSWIN1252, the last_name column of the table can hold up to 10 Western European characters, occupying a maximum of 10 bytes. When the database character set is Unicode (AL32UTF8), the last_name column can still hold up to 10 Unicode characters regardless of the language; however, it can occupy a maximum of 40 bytes.

Setting Length Semantics and Creating a Table

-- reset NLS parameters back to default here
ALTER SESSION SET NLS_LANGUAGE = American NLS_TERRITORY = America;
CREATE TABLE temp_employees_table
( employee_id NUMBER(4), last_name VARCHAR2(10 CHAR), job_id VARCHAR2(9), 
  manager_id NUMBER(4), hire_date DATE, salary NUMBER(7,2), 
  department_id NUMBER(2)) ;
-- cleanup: drop the table
DROP TABLE temp_employees_table;


See Also:

"Length Semantics for Character Datatypes" in Oracle Database Concepts