Software engineers are generally familiar with Sets as something that they learn in mathematics at university. They may also have learned about HashTables, a data structure that stores keys and values where keys are unique and values are stored at a location in memory that is the Hash of the key.

I recently discovered that C# has a datastructure called HashSet. It is a datastructure that stores unique or distinct elements like a Set, but probably named a HashSet as under the hood it is implemented like a HashMap or a HashTable.

It has some interesting methods too.

    public static void Main(string[] args)
    {
        HashSetDemo();
    }
    
    static void HashSetDemo()
    {
        var numbersArray = new [] { 1, 2, 2, 2, 3, 3, 4, 4, 9, 9 };
        // a data structure to do Set operation
        // a hashset is a collection of distinct elements
        // doesn't add duplicates, hence gets rid of duplicate elements
        var setOfNumbers = new HashSet<int>(numbersArray);
        Console.WriteLine("Initialising a HashSet<int> with { 1, 2, 2, 2, 3, 3, 4, 4, 9, 9 }");
        foreach (var element in setOfNumbers)
        {
            Console.WriteLine($"{element}");
        }
        var numbers1Array = new[] { 12, 15, 13, 19, 19, 21 };
        // UnionWith adds elements of the param to the method to the set 
        setOfNumbers.UnionWith(numbers1Array);
        Console.WriteLine("After {1, 2, 3, 4, 9}.UnionWith({ 12, 15, 13, 19, 19, 21 })");
        foreach(var element in setOfNumbers)
        {
            Console.WriteLine($"{element}");
        }
        // ExceptWith() removes elements of the param from the set
        setOfNumbers.ExceptWith(new int[] { 13, 9 });
        Console.WriteLine("After {1, 2, 3, 4, 9, 12, 15, 13, 19, 21}.ExceptWith({ 13, 9 })");
        foreach (var element in setOfNumbers)
        {
            Console.WriteLine($"{element}");
        }
        // another slightly weird method, called SymmetricExceptWith
        // removes elements from the arry only if both sets have the element. 
        // in other words, hashSet1.SymmetricExceptWith(hashSet2) removes the intersecting
        // elements, i.e. the ones that are in both set 1 and set2
        var numbers2Array = new int[] { 10, 11, 12, 13, 14 };
        var newSetOfNumbers = new HashSet<int>(numbers1Array);
        newSetOfNumbers.SymmetricExceptWith(numbers2Array);
        Console.WriteLine("After { 12, 15, 13, 19, 19, 21 }.SymmetricExceptWith({ 10, 11, 12, 13, 14 })");
        foreach (var element in newSetOfNumbers)
        {
            Console.WriteLine($"{element}");
        }
    }

Let us take a look at the methods that I have used in the little demo script.

UnionWith

Plain old Set Union operation. Adds elements of one set to another. And you could Union a set with any data structure that implements an IEnumerable. Thus you could Union a HashSet, which already has no duplicate elements, with an array with some duplicates and the resulting HashSet would be a Union of both without any duplicates!

ExceptWith

As the name suggests, this one is called on a HashSet and takes in an IEnumerable and the result is the removal of the elements that were in the IEnumerable that was given as argument to the method.

SymmetricExceptWith

I didn’t see this coming and I had no idea what it would do until I tried it. It just popped up as an intellisense thing. This one again, takes in an IEnumerable, however, the result is a HashSet without the intersection of the elements in the HashSet on which it was invoked and the parameter to this method.

What about performance?

HashSet performs just like how you’d expect a HashTable to perform. It is best suited for Lookups and Deletes. Addition of elements, although in the worst case, can be O(n), best case is still O(1).

I have been writing C# code on and off for the past 5 years now and it is only last week that I used HashSet. The use case was to create a fake implementation of a datastore, representing data in a table. HashSets seem to imitate that very well, especially when your table data is pretty simple. In more complex scenarios, I would just find a way to use an InMemoryDatabase Provider.