Primitive type obsession

the-old-vs-the-newAs professional developers, we are asked to automate behaviors by writing high quality code. Our favorite programming languages allow us to do so by providing some primitive types we can manipulate in our program. Yet he can be easy to fall into the trap of primitive type obsession. This code smell occurred when we use primitive data types (string, int, bool, …) to model our domain ideas.

The problem

In one of my previous blog post, I demonstrate how to create custom model validation using ASP .NET MVC using an IP address as an example. I had the following property in my Model:

public string IPAddress { get; set; }

This is clearly a primitive type obsession, because using a string to store an IP address can be dangerous, the compiler will not prevent me from writing the following code:

model.IPAddress = null;
model.IPAddress = "";
model.IPAddress = "foobar";

In C# .NET a string can contain much more information than an IP address, the later is far more specific and should be designed accordingly. I will then use Object Oriented Programming and encapsulation to create a custom IPAddress type.

The refactoring

In the other article I explained that the address follows the IPv4 format which is composed of 4 bytes, then let’s create a class representing this concept.

public class IPAddress
{
    public byte FirstByte { get; set; }
    public byte SecondByte { get; set; }
    public byte ThirdByte { get; set; }
    public byte FourthByte { get; set; }
}

Does it make sense to be able to instantiate an IPAddress object without setting the bytes? Not really but for now it is possible, let’s fix that!

public class IPAddress
{
    public byte FirstByte { get; private set; }
    public byte SecondByte { get; private set; }
    public byte ThirdByte { get; private set; }
    public byte FourthByte { get; private set; }
 
    public IPAddress(byte first, byte second, byte third, byte fourth)
    {
        FirstByte = first;
        SecondByte = second;
        ThirdByte = third;
        FourthByte = fourth;
    }
}

I now have an object that actually represents an IP address that must be instantiated with all the data it requires. And I now have a class where I can add related logic to ease its use.

public class IPAddress
{
    public byte FirstByte { get; private set; }
    public byte SecondByte { get; private set; }
    public byte ThirdByte { get; private set; }
    public byte FourthByte { get; private set; }
 
    public IPAddress(byte first, byte second, byte third, byte fourth)
    {
        FirstByte = first;
        SecondByte = second;
        ThirdByte = third;
        FourthByte = fourth;
    }
 
    public static bool TryParse(string ipAddress, out IPAddress ipaddress)
    {
        // some validation logic & parsing logic
        // ...
    }
 
    public override string ToString()
    {
        return string.Format("{0}.{1}.{2}.{3}", FirstByte, SecondByte, ThirdByte, FourthByte);
    }
}

I switched from primitive type obsession to “domain modeling” by creating a value object. This kind of refactoring is very helpful for a lot of other cases. For example country codes stored in string while they are only 2 or 3 characters long with only a pre-defined number of possible values. Or an amount represented by an integer which can be negative and completely leaves out the currency of this amount. These are only a few examples among many.

It’s easy to fall into the trap of this code smell, using primitive data types is quick but it can allow unwanted side effects you will have to check whereas value objects protect the code from such behaviors.

See you next time!


Image credits:

http://erikapov.blogspot.fr/2010/01/old-vs-new.html

11 thoughts on “Primitive type obsession

  1. This is a very interesting discussion here.

    I tend to think of the main problem with primitive type obsession is a lack of proper abstraction. I have seen many code examples where there are methods with about 8 parameters when actually most of those parameters could have been grouped together so that only a couple of parameter arguments are used across many methods resulting in simpler and more readable code. In other words primitive obsession is bad because it results in over-complicated code.

    At first glance this post looked to be a good example of primitive type obsession as the opposite of this: as an oversimplification, that warrants DDD to the rescue. Having read Darren’s arguments, I am inclined to agree that although the simpler code could have problems, on the whole it is preferable to the more complex code.

    I am partway through Eric Evans book. It is undoubtedly good book, perhaps a great one, but I have heard many people comment that it has lead them to write worse code rather than better code. So, although I believe that Eric Evan’s advice is very good, it can very easily lead to designs that are more complex than they need to be.

    So don’t feel bad that this wasn’t the best example. It is actually a very good example because it is a good example of how hard Domain Driven Design is to master.

    Liked by 1 person

  2. I disagree entirely. You call storing IP Address as a string a code smell, a “primitive type obsession,” and a trap, which are all very loaded and scary terms. I don’t want any of those! But I’m afraid by doing this, you’re trading one potential problem for a definite problem.

    I think the central issue here is your statement “the compiler will not prevent me from writing the following code.” Compilers are not designed to prevent you from writing bad code or validation – they compile code to be executed later. For every compile-time check a compiler gives you, there are a billion runtime errors that can be had. I don’t know why a compiler should be a consideration when designing data-pushing apps, like one that handles an IP address in this way.

    That aside, I don’t see a problem that’s being solved. You’ll be receiving IP address as string from practically any outside source. If you’re persisting the address, it will most likely be as a string. If someone provides null, or “”, or “foobar” as an IP address, it will result in a null IPAddress object. Any code that uses IPAddress will have to consider the fact that it could be invalid — you’ve just converted the form of “invalid” from a function call to something like IPAddress.IsValid to a single null reference.

    The cost of this, essentially, a refactor, is more complicated code. This IPAddress object of your is a better representation of an IP address, but I bet your application does not exist to provide the best domain representation of IP addresses in code. It exists to push data around and do things based on that data. Your new abstraction will only make it harder to accomplish that task.

    Like

    • Hello Darren, thank you for your comment on this topic.

      When I created this article I was wondering if the IP address is a good example for the topic, and from what I see in your comment I think I might have chose poorly.

      In this blog post I only intended to show how to switch from a primitive type to a value object in order to add more control and behavior on a business domain object.

      In my case, since it is just an example, I don’t really have a business domain to model so it does not really makes sense to do what I did. As usual, it depends, if you need to be able to retrieve subnet masks and gateway information from an IP address, I still think that the value object will be more helpful than a string.

      For the mention of the compiler, for sure it will not prevent runtime errors. But it’s a tool we can use in order to prevent mistakes when designing an application with value object types.

      I only wanted to show a software development concept and a trap we can sometimes fall into, it is not a silver bullet, just a practice that can be relevant from time to time depending on the context.

      Like

      • Hi Julien,

        Oh no, don’t concede anything to me! :P

        I think the IP address is actually good example, because I think a lot of types fall within this situation. We take them as simple bits of data, but the ways the data can be quickly expanded based on our needs. Like you mention with an IP address… we want to deal with it as an actual IP address, not a string. But it comes in a string, so what do we do?

        A general feeling I have about well-architected software is that the details are pushed from the core of the application to the exterior. So yes… we will want to interact with IP address in certain ways, but we’ll only want to do this in a relatively few portions of very specific code. Perhaps validation? Firing a request to that IP address? Just bits like that.

        In those cases, I’d suggest using an IP address, but consider it as a “helper”. You look up the object, it as a string IP address, and in each case you instantiate a new IP address object with that string… and then you get all of the benefits you’re talking about, but the detail of the actual conversion from string to a more complex type is kept on the exterior — not the core of your application (where the domain object that *has* an IP address is built).

        Like

      • Hello, Julien.

        IPAddress is a good example. The other common examples: using decimal instead of Money type and ZipCode as a string instead of a separate class (conception).

        Like

      • Hello Elias,
        Thank you for your comment. I like your examples, they are plenty of them where primitive types are used instead of a specific domain object.
        One thing I see a lot which is some kind of primitive type obsession as well is when a method uses a lot of primitive type parameters instead of a single object encapsulating these parameters.

        Like

    • Hello, Darren.
      Let me disagree with your statement that “Compilers are not designed to prevent you from writing bad code or validation”.
      Actually, the entire concept of TDD was invented in order to speed up the feedback about how much your application is healhy right now. Compiler is the fastest way to get the feedback. The more errors can be catched by a compiler, the less money of your employer you spend on debugging and development. So your statement is not correct, seemingly. There was a post by Mark Seemann on the topic of getting feedbacks as soon as we can, but I can’t find it right now.

      Like

      • Engineerspock… let me ask you… how’s that working out for you?

        Compiler checks, that is. I’m assuming you use one, like many of us… so is your code bug free? What about the rest of us?

        Right… lots of bugs seem to make it through the compiler, anyway, and employers spend lots of money on development and debugging.

        You are correct that compilers give you fast feedback, just like TDD… but the type of feedback they provide is very different. Compilers will tell you that your syntax is correct and that the code can be compiled, but TDD will tell you if the program is behaving the way you expect (*). Which is more relevant to the users of our software?

        Using compilers to save time and money in development… it’s an interesting theory that has been disproven over and over and over in practice.

        (*) BTW, by running the application and checking its behaviors, you also get the same checking that a compiler would provide. Because if the code is bad, how would the tests pass?

        Like

      • The thing is that we really have power to write mindless code, which we will be able to check only at runtime (typesafety, codecontracts on abstract classes which represent a consept, rather than some primitive). And when you can shift checking to the stage of compiling, you must do it.

        Like

      • Engineerspock, but I asked a specific question:

        How is that working out for you?

        How is that saving employers money and time developing software, and how is that preventing bugs?

        Because as far as I can tell, it’s done nothing. It’s probably made things worse, even… because we have programmers modeling stuff in code, a compiler pats us on the back and says, “Your representation exists,” and we pretend it maps to the business. But it doesn’t, and bugs abound, and we debate abstracting IP addresses or not.

        Like

      • It works for our team very well. Regarding the original topic and compilers in addition, consider the following example:
        For some time our team has been using primitive types for representing money. Int32 was used for cents and decimal for dollars. Using only decimals will not save you, because sometimes we were forced to take the right part of the value, to speak more precisely, we had to interact with a 3rd part system wich required to pass into it money values expressed in cents using Int32. Propagating primitives caused some bugs of conversions in the end. After we introduced Money-type, we eliminated similar bugs once and forever. Now, when my function requires to pass in the value of Money type, I’m sure that nobody will pass Int32-value, or something else. Abstracting away the concept of Money with conjuction of the compiler’s power we are sure regarding Money-values, that everything is OK everywhere. This reduces the overall entropy regarding this problem down to 0 (or very close to 0). In order to explore this topic deeper I will write a post, it’s hard in comments to get deeper))) And by the way, thanks for discussion!

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s