17 Nov 2021

Garbage collection in Python

Problems with memory management:

  • forgetting to free the memory,
  • freeing it up too soon.

A popular method for automatic memory management is reference counting: runtime keeps track of all the references to an object, when an objects has 0 references, it can be deleted.

The cost of automatic memory management

  • The program needs additional memory and computation to track references
  • Many languages use “stop-the-world” process, where all execution stops while the garbage collector does its thing.

Python implementation (CPython)

CPython uses:

  • reference counting, can’t be disabled
    • pros: can immediately remove an object when it has no references
    • cons: inability to detect cyclic references
  • generational garbage collection, ***************************can be disabled***************************

Viewing reference counts

To increase the reference count we can:

  • assign an object to a variable
  • add object to a data structure (e.g. a list)
  • pass the object as an argument to a function
import sys
a = 'some-string'
sys.getrefcount(a)

# Output: 2

Generational garbage collection

What happens when you add an object to itself?

class MyClass(object):
    pass

a = MyClass()
a.obj = a
del a

We’ve deleted the instance, so it’s not longer accessible, but Python didn’t destroy it from memory. It doesn’t have a reference count zero because it has a reference to itself. This is called the reference cycle.

Terminology

  1. Generation:
    1. A new object starts its life in the first generation of the garbage collector.
    2. If Python runs GC and the object survives, it moves to the second generation.
    3. Python GC has three generations.
  2. Threshold:
    1. For each generation the GC has a threshold number of objects. If the number of objects exceeds threshold, GC will trigger collection.

Generational garbage collection’s behaviour can be changed, the thresholds can be adjusted, collection can be manually triggered or it can be disabled altogether.

import gc

gc.get_count()

# Output: (596, 2, 1)

596 in 1st generation, 2 in second and 1 in third.

gc.collect()
gc.get_count()

# Output: (18, 0, 0)

General rule: Don’t change the GC’s behaviour

  • Python’s key benefit is that it enables developer productivity
  • If you find that your GC is slowing you down you might want to invest in more the power of your execution environment instead of playing around with your GC
  • Python doesn’t generally release memory back to the OS so you might not get the results you want with manual GC