Natural String Sorting
Have you ever desired to have in your code a way to order a sequence of strings in the same way as Windows does for files whose name contains a mix of letters and numbers? Natural string sorting is not natively supported in .NET but can be easily implemented by specialising a string comparer and adding a few extensions to the enumerable string collection.
Ordinal numbers are the words representing the rank of a number with respect to some order, in particular order or position (i.e. first, second, third, etc.).
As humans, we learn to put numbers in the correct order as a primary skill, according to their ordinality. We can also order strings containing letters and numbers very easily just by looking at the alphabetical order of letters and the ordinality of a number. For example, the string “Year 1” precedes “Year 2”, which also comes before “Year 10” in this natural order. Basically, we know how to separate the alphabetical part of the sentence from the numerical one, and then we are able to sort the full string by applying two different “sorting algorithms” (alphabetical and ordinal, respectively). Variations of these algorithms may apply in different cultures, but the principle is the same.
Computers, on the converse, use the ASCII table for sorting strings, from the dawn of the informatics era. This table presents a sequence of letters, numbers, punctuation and control characters in a fixed order that tends to mirror the natural order of the alphabet and numbers, and associates every entry in the table (a character) to a number. For example, number 65 corresponds to character “A”, “B” is 66, “C” is 67 and so on. Likewise, also characters 0 to 9 have an ASCII number: “0” is 48, “1” is 49 until “9”, which is 57. Numbers after 9 corresponds to two or more characters (10 is “1” and “0”).
Sorting characters according to the ASCII table, which is what computers do, basically implies sorting each character individually according to their position in the table: same “sorting algorithm” irrespective of the type of the character, whether letter, number, punctuation character or other.
By applying this logic, strings are compared character by character and then sorted according to the numerical position of the character in the ASCII table. The string “Year 1” would still be ordered before “Year 2”, as character “1” appears before “2” in the table. But “Year 10” is before “Year 2” because the “1” character in “10” is compared first with “2” and, again, “1” takes precedence over “2”, irrespective of the following characters.
Computers use the ASCII table for sorting strings
Let’s make an example in C# of displaying a sequence of strings in the standard order supported by the OrderBy() LINQ extension to enumerable collections.
var files = new List<string>
{
"File 1.txt",
"File 2.jpg",
"File 3.doc",
"File 10.txt",
"File 11.csv",
"File 20.xls",
"File 21.ppt"
};
foreach (var file in files.OrderBy(n => n))
{
Console.WriteLine(file);
}
String ordering with LINQ
This simple snippet of C# code orders all strings in the “files” list by using the OrderBy() LINQ extension, and then prints the result on the output console. As it is visible in the following picture, the output of this operation is a list of file names sorted according to the ASCII rule of order of characters.
Output of the ordering program
Not the result we would like to have, and definitely not so user-friendly. So, what can we do about it?
Sticking to the example of the file names, Windows already supports natural sorting of files by name, as in the following screenshot.
Natural file sorting by name in Windows
Much better, isn’t it? So, Windows is clever enough to think like a human and understand the alphabetical and numerical part of a string separately. Can we do the same in C#?
The approach to take for introducing natural ordering of a collection of strings in C# starts from the implementation of a string comparer that contains the logic for understanding what is alphabetical and what is numerical within a string. After that, we would want to introduce a few extensions in the same style as LINQ does, for having a shortcut in an enumerable collection of strings to the natural sorting methods.
We have seen in the previous example that the easiest way to sort a collection of strings is to use the LINQ extension OrderBy() on an enumerable object. The OrderBy() method has an overload that accepts an IComparer<TKey> for specifying a bespoke way of ordering objects of type TKey.
The signature of this overload is:
public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IComparer<TKey> comparer);
The key to perform custom comparisons of strings is to implement the IComparer interface, then. This interface exposes only one method, Compare, with the following signature:
int Compare(T x, T y);
The implementation of Compare in our custom comparer class should perform a comparison of the two objects and return a value indicating whether one is less than, equal to, or greater than the other. A value less than zero is returned if x is less than y; zero is returned is x equals y; a value greater than zero is returned if x is greater than y. Comparing null with any reference type is allowed and does not generate an exception. A null reference is considered to be less than any reference that is not null.
public class NaturalStringComparer : IComparer<string>
{
private bool _ignoreCase = true;
public NaturalStringComparer() : this(true)
{
}
public NaturalStringComparer(bool ignoreCase)
{
_ignoreCase = ignoreCase;
}
public int Compare(string x, string y)
{
// full implementation on the attached solution
}
}
Implementation of a natural string comparer in C#
The NaturalStringComparer implements the IComparer<string> interface, and specifically its method Compare(), and also presents an overloaded constructor for discerning between a case-sensitive or insensitive comparison. Clever! J
We can already use our NaturalStringComparer when ordering a collection of strings via the OrderBy() extension; with reference to the list of file names presented at the beginning of this article, we can simply use the overload of the OrderBy() method that accepts an IComparer in input for obtaining the desired result:
foreach (var file in files.OrderBy(n => n, new NaturalStringComparer()))
{
Console.WriteLine(file);
}
This generates the following result, which is exactly what we want:
Output of the natural ordering program
If we were to order the file names in descending order, we would use the OrderByDescending() extension with the explicit call to NaturalStringComparer:
foreach (var file in files.OrderByDescending(n => n, new NaturalStringComparer()))
{
Console.WriteLine(file);
}
What about a more concise way of ordering a collection of strings by adding two more extensions?
· OrderByNatural
· OrderByNaturalDescending
The signature for these extensions, and their overload for allowing case-sensitive sorting, is the following; remember that extension methods must be declared as public static in a static class:
public static class StringExtensions
{
public static IOrderedEnumerable<string> OrderByNatural(
this IEnumerable<string> strings,
Func<string, string> keySelector);
public static IOrderedEnumerable<string> OrderByNatural(
this IEnumerable<string> strings,
Func<string, string> keySelector,
bool ignoreCase);
public static IOrderedEnumerable<string> OrderByNaturalDescending(
this IEnumerable<string> strings,
Func<string, string> keySelector);
public static IOrderedEnumerable<string> OrderByNaturalDescending(
this IEnumerable<string> strings,
Func<string, string> keySelector,
bool ignoreCase);
}
LINQ-style extensions for ordering strings like human beings do
Internally, each method invokes the OrderBy() and OrderByDescending() extensions accordingly, with an explicit call to the NaturalStringComparer class, using the respective overload for ignoring case-sensitive comparison. So, in our file names example, sorting a list of strings by ordinal order would be as simple as:
foreach (var file in files.OrderByNatural(n => n))
{
Console.WriteLine(file);
}
Clever 2.0! J
Project Name: NaturalStringOrdering