Linguistic Sorting and Searching

Different languages have their own sorting rules. Some languages are collated according to the letter sequence in the alphabet, some according to the number of stroke counts in the letter, and some are ordered by the pronunciation of the words. Treatment of letter accents also differs among languages. For example, in Danish, Æ is sorted after Z, while Y and Ü are considered to be variants of the same letter.

You can define how to sort data by using linguistic sort parameters. The basic linguistic definition treats strings as sequences of independent characters.

This section contains the following topics:

NLS_SORT Parameter
NLS_COMP Parameter
Case-Insensitive and Accent-Insensitive Searching

NLS_SORT Parameter

The NLS_SORT parameter specifies the collating (linguistic sort) sequence for ORDER BY queries. It overrides the default NLS_SORT value that is derived from the NLS_LANGUAGE parameter. The value of NLS_SORT can be BINARY or any valid linguistic sort name:

NLS_SORT = BINARY | sort_name

If the value is BINARY, then the collating sequence is based on the numeric code of the characters in the underlying encoding scheme. Depending on the datatype, this will either be in the binary sequence order of the database character set or the national character set. If the value is a named linguistic sort, sorting is based on the order of the defined sort. Most, but not all, languages supported by the NLS_LANGUAGE parameter also support a linguistic sort with the same name.

Spain traditionally treats ch, ll, and ñ as letters of their own, ordered after c, l, and n, respectively. Example: Setting NLS_SORT to BINARY and Example: Setting NLS_SORT to Spanish illustrate the effect of using a Spanish sort against the employee names Chen and Chung. In Example: Setting NLS_SORT to BINARY, the NLS_SORT parameter is set to BINARY.

In Example: Setting NLS_SORT to BINARY, the LIKE comparison operator is used to specify the records to return with the query. For information about LIKE, see "Restricting Data Using the WHERE Clause".

Setting NLS_SORT to BINARY

-- set the NLS_SORT for this user session
ALTER SESSION SET NLS_SORT=binary;

-- select the last name of those employees whose last name begin with C
SELECT last_name FROM employees
   WHERE last_name LIKE 'C%' ORDER BY last_name;

The output from the example should be similar to the following:

LAST_NAME
--------------
Cabrio
Cambrault
Cambrault
Chen
Chung
Colmenares

In Example: Setting NLS_SORT to Spanish, the NLS_SORT parameter is set to SPANISH_M.

Setting NLS_SORT to Spanish

-- set the NLS_SORT for this user session
ALTER SESSION SET NLS_SORT=spanish_m;

-- select the last name of those employees whose last name begin with C
SELECT last_name FROM employees 
   WHERE last_name LIKE 'C%' ORDER BY last_name;

The output from the example should be similar to the following:

LAST_NAME
--------------
Cabrio
Cambrault
Cambrault
Colmenares
Chen
Chung

Note that the order of last names in the output from the SELECT statement in Example: Setting NLS_SORT to BINARY and Example: Setting NLS_SORT to Spanish is different.

NLS_COMP Parameter

When using comparison operators, characters are compared according to their binary codes in the designated encoding scheme. A character is greater than another if it has a higher binary code. Because the binary sequence of characters may not match the linguistic sequence for a particular language, those comparisons might not be linguistically correct.

The value of the NLS_COMP parameter affects the comparison behavior of SQL operations. The value can be BINARY (default) or LINGUISTIC. You can use the NLS_COMP parameter to avoid the cumbersome process of using the NLSSORT function in SQL statements when you want to perform a linguistic comparison instead of a binary comparison. When NLS_COMP is set to LINGUISTIC, SQL performs a linguistic comparison based on the value of the NLS_SORT parameter.

Example: Setting NLS_COMP to BINARY and Example: Setting NLS_COMP to LINGUISTIC illustrate the effect of performing a binary comparison follow by a Spanish linguistic sensitive comparison against the employee names. In Example: Setting NLS_COMP to BINARY the NLS_COMP parameter is set to BINARY while NLS_SORT is set to Spanish.

Setting NLS_COMP to BINARY

-- set NLS_SORT and NLS_COMP for this user session
ALTER SESSION SET NLS_SORT=spanish_m NLS_COMP=binary;

-- select the last name of those employees whose last name begin with C
SELECT last_name FROM employees 
  WHERE last_name LIKE 'C%';

The output from the example should be similar to the following:

LAST_NAME
--------------
Cabrio
Cambrault
Cambrault
Chen
Chung
Colmenares

In Example: Setting NLS_COMP to LINGUISTIC the NLS_COMP parameter is set to LINGUISTIC while NLS_SORT is set to Spanish.

Setting NLS_COMP to LINGUISTIC

-- set NLS_SORT and NLS_COMP for this user session
ALTER SESSION SET NLS_SORT=spanish_m NLS_COMP=linguistic;

-- select the last name of those employees whose last name begin with C
SELECT last_name FROM employees 
  WHERE last_name LIKE 'C%';

The output from the example should be similar to the following:

LAST_NAME
--------------
Cabrio
Cambrault
Cambrault
Colmenares

Note the difference in the output from Example: Setting NLS_COMP to BINARY and Example: Setting NLS_COMP to LINGUISTIC. In Spanish ch is treated as a separate character that follows c, so ch is excluded when a Spanish linguistic-sensitive comparison is performed in Example: Setting NLS_COMP to LINGUISTIC.

Case-Insensitive and Accent-Insensitive Searching

Operations inside of a database are sensitive to the case and the accents of the characters. Sometimes, you might need to perform case-insensitive or accent-insensitive comparisons. Use the NLS_SORT session parameter to specify a case-insensitive or accent-insensitive sort.

To specify a case-insensitive or accent-insensitive sort:

Append _CI to an Oracle sort name for a case-insensitive sort. For example:

BINARY_CI: accent-sensitive and case-insensitive binary sort

GENERIC_M_CI: accent-sensitive and case-insensitive GENERIC_M sort
Append _AI to an Oracle sort name for an accent-insensitive and case-insensitive sort. For example:

BINARY_AI: accent-insensitive and case-insensitive binary sort

FRENCH_M_AI: accent-insensitive and case-insensitive FRENCH_M sort