top of page

Python Has A Major Scalability Flaw! - System Design

Scaling applications is a necessity in today’s tech world.

An application’s use can expand exponentially and hence applications should be developed with the aim to handle more and more requests per second as this happens.

Before talking about scalability, we need to be familiar with a few computer science concepts.

Let’s talk about them!

What is a Process? : A Process is the running instance of a program/ application. A program can have multiple instances of it (Processes) running at the same time.

What is a Thread? - A Thread represents the actual processor instructions that are being executed in the context of a Process. Each Process has at least one thread executing its instructions.

What is Multi-threading? - In Multi-Threading, multiple threads work together, executing instructions within the context of a process in parallel. Threads within a process share state information, memory, and many other attributes.

Multi-Threaded Application

In a multi-threaded program, the application process can utilize different cores in the processor of the computer’s CPU and run multiple threads in parallel.

This is done to improve computational efficiency / reduce execution time. This forms the basis of Vertical scaling and the next step can be to add in more and faster processors / executing cores. But this has limitations. You cannot have ever-increasing processing power to upgrade to and this becomes expensive!

To know more about scaling: Checkout this blog on Horizontal vs Vertical Scaling -

Cost of Horizontal vs. Vertical Scaling

Multi-Threading in Python Applications

Let’s write a function called countdown and execute it

This function takes 1.946 seconds to finish its execution.

Let’s use multi-threading and execute this function again.

One would expect this to run twice as fast because we have used two threads to execute this function.

Surprisingly, this does not work the way we expect it to!

Do you know why?

The reason for this is the Python GIL (Global Interpreter Lock).

Let’s talk about where it comes from.


CPython is the default and most widely used reference implementation of Python.

It is written in C and Python. The function of CPython is to compile Python code into bytecode before interpreting it. CPython uses the GIL on each CPython interpreter process.

But, What is GIL?

GIL or Global Interpreter Lock is a mutual exclusion(mutex) lock. It is a way to avoid race conditions.

It makes sure that during the execution of a Python interpreter process, only one thread may be processing Python bytecode at any one time.

What Problem Does This Cause?

Applications can handle two types of operations:

  • CPU bound

  • I/O bound

CPU-bound operations are computationally expensive and are limited by the capability of the processor for completion.

For example, complex mathematical operations are CPU-bound.

On the other hand, I/O bound operations for completion are limited by the period spent waiting for input/ output operations.

For example, database and network operations.

GIL works against multithreading in CPython when an application is using CPU-intensive operations distributed across multiple cores.

Although, multi-threading can still be a possibility for I/O-bound operations.

A great visualization regarding this can be found in the blog by Dave Beazley, a Python pioneer.


2 CPU-bound threads running on a machine with a single processing core

He talks about an application that runs two CPU-bound threads on a computer with a single processing core.

This works well and both threads execute concurrently.

When the same dual-threaded application is run on a computer with two processing cores, something weird happens because of GIL.

2 CPU-bound threads running on a machine with two processing core

In the above, the red regions show the times when one of the threads of a processor cannot run because the other thread on the other core is holding it.

Interestingly, GIL makes multi-threading with I/O operations slow as well.

Multi-threading with one I/O bound and another CPU-bound thread

In the above, note that the I/O bound operation faces difficulty acquiring the GIL from the CPU bound operation, in order to perform its processing.

Why Use GIL Then?

GIL was developed because it offered many advantages:

  • It was easy to implement while ensuring thread safety and avoiding race conditions

  • It increased the speed of single-threaded applications

  • It allowed easy integration of Python with many C libraries that are not thread-safe

Since many Python packages and modules have been developed keeping the GIL in mind, it makes it hard to remove the GIL without breaking them.


  • A possible solution can be to use another Python reference implementation that does not use GIL. Examples of these are Jython (runs on Java) and IronPython (runs on .NET CLR and Mono).

  • Another one is to use multiple processes rather than multiple threads to scale an application (using the Python multiprocessing package)

  • Finally, the ultimate solution could be to work on removing the GIL from CPython (popularly called “Gilectomy”).

This hasn't been done yet due to many hard requirements given when removing GIL.

You can read about the requirements here:

You can also check out the gilectomy branch of the repository below for more updates on this work.

(Note: This post was originally published here.)


That’s everything for this article! Thanks a lot for reading!

Let us know your thoughts and experiences on this topic in the comments. Happy coding!