Metaclasses in Python

What is a metaclass, someone may ask? In short, a metaclass is to a class, what class is to an object.

Metaclasses in Python

What is a metaclass, someone may ask? In short, a metaclass is to a class, what class is to an object.

Both of my previous articles were pretty heavily finance-flavoured. Therefore I've decided to walk one step back and have a little break – write about something totally unrelated – I hope you'll like it.

Metaclasses are not the most popular aspect of Python language; They don't come up in every conversation. Still, they are being used in quite a few high-profile projects: e.g. Django ORM[2], standard library Abstract Base Classes (ABCs)[3] and Protocol Buffers implementation[4].

It is a complex feature which enables the programmer to customize some of the most basic mechanisms of the language. Because of this flexibility, there is room for abuse – but we've already heard about it: With great power comes great responsibility.

This topic is usually not covered by various tutorials and introductions to the language, as the subject is deemed "advanced" – everyone has to start somewhere, though.
After a quick browsing through the internet, the best introduction to the subject I found
online was a response to the StackOverflow question[1].

Let's start our own journey. All code examples are written using the newest version of Python 3.6.

First encounter

We've talked a bit about it already, but we haven't seen a metaclass yet. It will change soon, but please bear with me for a while. We'll start with something simple: creating an object.

>>> o = object()
>>> print(type(o))
<class 'object'>

We've created a new object and stored a reference to it in variable o.
The type of o is object.

We can define our own class as well:

>>> class A:
...     pass
...
>>> a = A()
>>> print(type(a))
<class '__main__.A'>

We now have two badly-named variables a and o and we can verify their membership to the corresponding classes:

>>> isinstance(o, object)
True
>>> isinstance(a, A)
True
>>> isinstance(a, object)
True
>>> issubclass(A, object)
True

One interesting thing that happened above is that object a is also of type object. It is so because class A is a subclass of object (every user-defined class derives from object).

Another one is that we can use variables a and A interchangeably in many contexts. For functions like print, there isn't too much of a difference whether we supply a or A – both calls print "something".

Let's see this in more detail with our newly defined class B:

>>> class B:
...     def __call__(self):
...         return 5
...
>>> b = B()
>>> print(b)
<__main__.B object at 0x1032a5a58>
>>> print(B)
<class '__main__.B'>
>>> b.value = 6
>>> print(b.value)
6
>>> B.value = 7
>>> print(B.value)
7
>>> print(b())
5
>>> print(B())
<__main__.B object at 0x1032a58d0>

As we can see, b and B behave similarly in multiple ways. We can even make a function call expression using both -- they just return different things: b returns 5, as we've defined in the class definition, while B creates a new class instance.

This similarity is not an accident, it's a part of the design of the language. In Python, classes are first-class citizens[5] (behave like all normal objects).

Furthermore, if classes are like objects, they must have their own type:

>>> print(type(object))
<class 'type'>
>>> print(type(A))
<class 'type'>
>>> isinstance(object, type)
True
>>> isinstance(A, type)
True
>>> isinstance(A, object)
True
>>> issubclass(type, object)
True

It turns out both object and A are of class typetype is a "default metaclass". All other metaclasses have to derive from it. It may be slightly confusing at this point, that a class has a name type but at the same time, it's also a function that returns the type of a supplied object (type has completely different semantics, depending on whether you give it 1 or 3 arguments). It's kept like that for historical reasons.

Both object and A are instances of object as well -- they're all objects in the end. What's the type of type then, one may ask?

>>> print(type(type))
<class 'type'>
>>> isinstance(type, type)
True

Turns out it doesn't go any deeper than that, as type is it's own type.

The whole trick about metaclasses: We've created A, a subclass of object so that new instance a was of type A and therefore also an object. In the same way, we can create a subclass of type called Meta. We can use it afterward as a type of new classes; they will be instances of both types: type and Meta.

Let's see it in practice:

class Meta(type):
    def __init__(cls, name, bases, namespace):
        super(Meta, cls).__init__(name, bases, namespace)
        print("Creating new class: {}".format(cls))
        
    def __call__(cls):
        new_instance = super(Meta, cls).__call__()
        print("Class {} new instance: {}".format(cls, new_instance))
        return new_instance

This is our first metaclass. We could have made this definition even more minimal, but I wanted it to do something at least tiny bit useful.

  • It overrides __init__ magic method to print a message anytime a new instance of Meta is created.
  • It overrides __call__ magic method to print a message anytime user uses function call syntax on an instance – writes variable().

It turns out that creating an instance of a class takes the same form in Python as calling
a function. If we have function f, we write f() to call it. If we have class
A, we write A() to create a new instance. Hence, we use the __call__ hook.

Still, the metaclass in itself is not that interesting. It is interesting only when we
create an instance of it. Let's do it now:

>>> class C(metaclass=Meta):
...     pass
...
Creating new class: <class '__main__.C'>
>>> c = C()
Class <class '__main__.C'> new instance: <__main__.C object at 0x10e99ae48>

>>> print(c)
<__main__.C object at 0x10e99ae48>

Our metaclass is indeed working as intended -- printing the messages when certain events in the lifecycle of a class happen. It is important to understand here that we work on three different levels of abstraction -- metaclass, class, and instance.

When we write class C(metaclass=Meta), we create C which is an instance of Meta -- Meta.__init__ is called, and the message is printed. In the following step, we call C() to create a new instance of class C and at this time Meta.__call__ is executed. As the last step, we did print the c instance, calling the C.__str__ which in the end resolves to the default implementation defined by the object base class.

We can inspect the types of all our variables:

>>> print(type(C))
<class '__main__.Meta'>
>>> isinstance(C, Meta)
True
>>> isinstance(C, type)
True
>>> issubclass(Meta, type)
True
>>> print(type(c))
<class '__main__.C'>
>>> isinstance(c, C)
True
>>> isinstance(c, object)
True
>>> issubclass(C, object)
True

I've tried make a mild introduction to the topic of metaclasses and I hope you got used to what they are and how we can use them. In my opinion though, this piece would be worthless without a few practical examples. That's what I'll do next.

Useful example: Singleton

In this section, we'll write a very tiny library with a bit of metaclasses. We'll implement a "blueprint" for the singleton design patten[6] -- a class that can only have a single instance.

To be honest, it can also be implemented without using metaclasses at all, by just overriding the __new__ method in the base class to return previously memorized instance:

class SingletonBase:
    instance = None

    def __new__(cls, *args, **kwargs):
        if cls.instance is None:
            cls.instance = super().__new__(cls, *args, **kwargs)

        return cls.instance

That's it. Any subclass deriving from SingletonBase has a singleton behavior now.
Let's see it in action:

>>> class A(SingletonBase):
...     pass
...
>>> class B(A):
...     pass
...
>>> print(A())
<__main__.A object at 0x10c8d8710>
>>> print(A())
<__main__.A object at 0x10c8d8710>
>>> print(B())
<__main__.A object at 0x10c8d8710>

The approach we've taken here seems to work – every attempt to create an instance returns the same object. But there is a behavior we may not have anticipated: when trying to create an instance of class B, we're getting back the same instance of A as before.

This issue can be fixed without resorting to using metaclasses in any way, but it has a very clear solution using them, so why not to?

We are going to have such a class SingletonBaseMeta so that every subclass of it initializes its instance field to None on creation.

Here it is:

class SingletonMeta(type):
    def __init__(cls, name, bases, namespace):
        super().__init__(name, bases, namespace)
        cls.instance = None

    def __call__(cls, *args, **kwargs):
        if cls.instance is None:
            cls.instance = super().__call__(*args, **kwargs)

        return cls.instance


class SingletonBaseMeta(metaclass=SingletonMeta):
    pass

We can try and see if this approach works:

>>> class A(SingletonBaseMeta):
...     pass
...
>>> class B(A):
...     pass
...
>>> print(A())
<__main__.A object at 0x1101f6358>
>>> print(A())
<__main__.A object at 0x1101f6358>
>>> print(B())
<__main__.B object at 0x1101f6eb8>

Congratulations, our new singleton library seems to work just as we planned!

Since we're experienced metaclass library designers now, let's take on a larger challenge.

Useful example: Simplistic ORM

As discussed previously, a Singleton pattern can be solved nicely with a little help of metaclasses, but they're not really necessary. The majority of projects where metaclasses are used in the real world are some variations of ORM[7].

As an exercise, we'll build something similar, although greatly simplified. It'll be a serialization/deserialization layer between Python classes and JSON.

This is how we want the interface to look like (modelled on Django ORM/SQLAlchemy):

class User(ORMBase):
    """ A user in our system """
    id = IntField(initial_value=0, maximum_value=2**32)
    name = StringField(maximum_length=200)
    surname = StringField(maximum_length=200)
    height = IntField(maximum_value=300)
    year_born = IntField(maximum_value=2017)

We want to be able to define classes and their fields together with types. From that we'd like to be able to serialize our class into JSON:

>>> u = User()
>>> u.name = "Guido"
>>> u.surname = "van Rossum"
>>> print("User ID={}".format(u.id))
User ID=0
>>> print("User JSON={}".format(u.to_json()))
User JSON={"id": 0, "name": "Guido", "surname": "van Rossum", "height": null, "year_born": null}

And deserialize it:

>>> w = User('{"id": 5, "name": "John", "surname": "Smith", "height": 185, "year_born": 1989}')
>>> print("User ID={}".format(w.id))
User ID=5
>>> print("User NAME={}".format(w.name))
User NAME=John

For all of the above we don't really need metaclasses, so let's implement one "killer" feature  – validation.

>>> w.name = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "simple-orm.py", line 96, in __setattr__
    raise AttributeError('Invalid value "{}" for field "{}"'.format(value, key))
AttributeError: Invalid value "5" for field "name"
>>> w.middle_name = "Stephen"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "simple-orm.py", line 98, in __setattr__
    raise AttributeError('Unknown field "{}"'.format(key))
AttributeError: Unknown field "middle_name"
>>> w.year_born = 3000
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "simple-orm.py", line 96, in __setattr__
    raise AttributeError('Invalid value "{}" for field "{}"'.format(value, key))
AttributeError: Invalid value "3000" for field "year_born"

Revisiting the type constructor

Before we move on to implementing the ORM library, I need to go back a bit and talk about one more thing - the type constructor. I've mentioned it very briefly, but it is an important topic that needs to be extended.

You probably remember the moment from the previous section, when I was defining the __init__ method for our first metaclass:

class Meta(type):
    def __init__(cls, name, bases, namespace):

Where do these three arguments name, bases, and namespace come from? They are the parameters of type constructor. These three values fully describe the class that is being created.

  • name - simply a name of the class as a string
  • bases - a tuple of base classes, can be empty
  • namespace - a dictionary of all fields defined inside a class. All methods and class variables go in here.

That's all there is. Actually, instead of defining a class using general syntax, we can call the type constructor directly:

class A:
    X = 5

    def f(self):
        print("Class A {}".format(self))


def f(self):
    print("Class B {}".format(self))

B = type("B", (), {'X': 6, 'f': f})

In this code, we've defined two classes, A and B which are almost identical.
They have different values assigned to a class variable X and print different messages when method f is called. But that's all – there are no fundamental differences, and both ways of defining classes are equivalent. The first mechanism is actually transformed into the second one by the Python interpreter.

>>> print(A)
<class '__main__.A'>
>>> print(B)
<class '__main__.B'>
>>> print(A.X)
5
>>> print(B.X)
6
>>> a = A()
>>> b = B()
>>> a.f()
Class A <__main__.A object at 0x1023432b0>
>>> b.f()
Class B <__main__.B object at 0x1023431d0>

That's the stage where defining you own metaclass lets you hook into. You can intercept parameters supplied to the type constructor, modify them and create your own class in the way you want it.

Simplistic ORM - literate program

We already know what we want - an implementation of a library satisfying the specified interface. We also know the tool we need for the job – metaclasses.

I'll proceed with the implementation, in a literate programming style. The code from this section can be loaded into the Python interpreter and run.

We'll be using only one package – for JSON parsing/serialization:

import json

Next, we'll define a base class for all our fields. It is quite simple, as most of the individual parts of this library will be. It contains a stub implementation of a validation function and an empty initial value.

class Field:
    """ Base class for all Fields. Every field needs an initial value """

    def __init__(self, initial_value=None):
        self.initial_value = initial_value

    def validate(self, value):
        """ Check if this is a valid value for this field """
        return True

For simplicity, I'll implement only two subclasses of Field: IntField and StringField. One can add more when needed.

class StringField(Field):
    """ A string field. Optionally validates length of a string """

    def __init__(self, initial_value=None, maximum_length=None):
        super().__init__(initial_value)

        self.maximum_length = maximum_length

    def validate(self, value):
        """ Check if this is a valid value for this field """
        if super().validate(value):
            return (value is None) or (isinstance(value, str) and self._validate_length(value))
        else:
            return False

    def _validate_length(self, value):
        """ Check if string has correct length """
        return (self.maximum_length is None) or (len(value) <= self.maximum_length)


class IntField(Field):
    """ An integer field. Optionally validates if integer is be"""

    def __init__(self, initial_value=None, maximum_value=None):
        super().__init__(initial_value)

        self.maximum_value = maximum_value

    def validate(self, value):
        """ Check if this is a valid value for this field """
        if super().validate(value):
            return (value is None) or (isinstance(value, int) and self._validate_value(value))
        else:
            return False

    def _validate_value(self, value):
        """ Check if integer falls in desired range """
        return (self.maximum_value is None) or (value <= self.maximum_value)

Except for forwarding initial_value to the base class constructor, most of this code is just validation routines. Again, it is not very hard to add more validations, but I just wanted to show you the simplest proof of concept possible.

In the StringField we want to check if a value is of correct type – str and if the length is lower or equal to the maximum value (if that value is defined). In the IntField we check if a value is an integer and if it is lower or equal supplied maximum value.

It is important to note, that we allow field values equal to None. An interesting exercise for the reader might be to implement required fields that are not allowed to be None.

A following piece, is our metaclass:

class ORMMeta(type):
    """ Metaclass of our own ORM """
    def __new__(self, name, bases, namespace):
        fields = {
            name: field
            for name, field in namespace.items()
            if isinstance(field, Field)
        }

        new_namespace = namespace.copy()

        # Remove fields from class variables
        for name in fields.keys():
            del new_namespace[name]

        new_namespace['_fields'] = fields

        return super().__new__(self, name, bases, new_namespace)

Our metaclass does not seem to be complex at all. It has one function, and one function only – to gather all instances of Field into a new class variable called _fields. All field instances are also removed from the class dictionary.

The only thing we need our metaclass for is to hook into the moment our class is created, take all field definitions and store them in one place.

Most of the actual work is done in the base class of our library:

class ORMBase(metaclass=ORMMeta):
    """ User interface for the base class """

    def __init__(self, json_input=None):
        for name, field in self._fields.items():
            setattr(self, name, field.initial_value)

        # If there is a JSON supplied, we'll parse it
        if json_input is not None:
            json_value = json.loads(json_input)

            if not isinstance(json_value, dict):
                raise RuntimeError("Supplied JSON must be a dictionary")

            for key, value in json_value.items():
                setattr(self, key, value)

    def __setattr__(self, key, value):
        """ Magic method setter """
        if key in self._fields:
            if self._fields[key].validate(value):
                super().__setattr__(key, value)
            else:
                raise AttributeError('Invalid value "{}" for field "{}"'.format(value, key))
        else:
            raise AttributeError('Unknown field "{}"'.format(key))

    def to_json(self):
        """ Convert given object to JSON """
        new_dictionary = {}

        for name in self._fields.keys():
            new_dictionary[name] = getattr(self, name)

        return json.dumps(new_dictionary)

Class ORMBase three methods, each one of them performing a specific task:

  • __init__ - Firstly, set all fields to initial values. Then, if there is a JSON document passed as a parameter, parse it and assign read values to our model fields.
  • __setattr__ - This is a magic method hook, that gets called whenever someone tries to assign a value to a class attribute. When someone writes object.attribute = value a method is called object.__setattr__("attribute", value). Overriding this method allows us to modify default behavior, in our case, by injecting validation code.
  • to_json - The simplest of all methods in a class. Simply takes all field values and serializes them into a JSON document.

And that's the whole implementation – our library is ready now. Feel free to check if it works as expected and modify it if you think it should work differently.

>>> User('{"name": 5}')
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-1-76a1a93378fc>", line 1, in <module>
    User('{"name": 5}')
  File "/Users/jrx/repos/metaclass-playground/simple-orm.py", line 86, in __init__
    setattr(self, key, value)
  File "/Users/jrx/repos/metaclass-playground/simple-orm.py", line 94, in __setattr__
    raise AttributeError('Invalid value "{}" for field "{}"'.format(value, key))
AttributeError: Invalid value "5" for field "name"

If you've got any questions or anything is unclear, please let me know in the comments.

Closing remarks

Code for this post can be downloaded from GitHub repository [8].

I hope you've enjoyed this article and found something insightful. Metaclasses may feel a bit obscure and not always useful. However, they certainly allow for constructing elegant libraries and interfaces, when used properly.

You can read more about how metaclasses are used in real life in [9].

If there is any particular topic, you'd like me to cover, please let me know.

References