Skip to content
Jukka Lehtosalo edited this page Jun 29, 2022 · 6 revisions

If you run mypy using the mypy tool, it performs incremental checking by default which reuses type checking results from the previous run via files in the .mypy_cache directory.

This coarse-grained (module-level) incremental checking is often enough, but it can result in a slow iteration speed in large code bases with hundreds of thousands of lines.

The mypy daemon makes incremental runs very fast (often a few hundred ms) by having a long-lived process that stores program state in memory between runs.

The daemon also maintains fine-grained dependencies. These dependencies are between definitions, not modules.

Example

Assume that we have a file a.py that contains dozens of functions, only one of which uses module b:

# a.py

import b

...  # lots of stuff that doesn't use b

def f() -> None:
    b.g(1)

The mypy daemon will process the entire program when starting up, and record that a.f depends on b.g, among other things.

Now we edit b.py and change the signature of b.g:

# b.py

def g(x: str) -> None: ...

Mypy daemon will first notice that the file b.py is changed (by checking the modification time). It will process b.py again and notice that the only externally visible change in b is the modified signature of b.g.

Using fine-grained dependencies, mypy daemon will figure out that in module a, we'll need to reprocess only the function a.f. This is much faster than processing the entire module a if the module is big.

After reprocessing a.f, mypy daemon will report an incompatible argument error.

How does it work

The mypy daemon has several things going on, summarized below.

Fine-grained dependencies

Mypy daemon keeps track of fine-grained dependencies between functions, attributes, classes, modules, etc.

The smallest unit of (re)processing is either a module top level or a top-level function/method. We only track dependencies at this level of granularity.

Processing modified files

Each file that is modified since the previous mypy daemon run will be processed in full, since mypy doesn't have an incremental parser.

After processing a file, the daemon takes a diff of the old and new symbol tables to find changed definitions. It will then "trigger" these definitions (i.e. reprocess everything that depends on them).

Now comes a tricky bit: we merge the new AST to the AST corresponding to the previous revision of the file by copying the contents of new AST nodes over to the corresponding old nodes (when they exist). This way references to the AST nodes in other modules will continue to point to the correct things.

Following triggered fine-grained dependencies

Mypy daemon uses fine-grained dependencies to find other parts of the codebase that need to be reprocessed in response to the triggered definitions.

To reprocess a triggered definition, the daemon first "strips" (or resets) the relevant AST nodes to match a fresh AST we get from the parser. We use this hack, since we don't have an incremental parser and want to avoid processing the entire file containing a triggered definition.

Reprocessing a definition implies performing semantic analysis and type checking on a stripped subset of some module AST.

After reprocessing, we again check if something externally visible has changed, and we may need to also trigger the dependencies of definitions we just reprocessed.

Completing an incremental step

Eventually we'll reach a fixed point where there are no additional triggered dependencies and we are done.

Relevant code

  • mypy/dmypy_server.py: Implementation of the daemon process
  • mypy/server/deps.py: Generate fine-grained dependencies for AST nodes
  • mypy/server/update.py: Fine-grained incremental processing logic
  • mypy/server/astdiff.py: Compare two versions of a module symbol table and find changed definitions
  • mypy/server/astmerge.py: Merge new version of an AST to the previous version to preserve object identities of nodes
  • mypy/server/aststrip.py: Strip an AST to make it "fresh", i.e. similar to what is produced by parser, with all changes performed by semantic analysis or type checking reverted

Test cases

Many daemon test cases perform multiple incremental steps to validate that propagating changes works as expected.

The primary test files are here: test-data/unit/fine-grained*.test

There are also more unit test style test cases for some operations:

  • Generating deps (test-data/unit/deps.test)
  • Performing AST diffs (test-data/unit/diff.test)
  • Merging ASTs (test-data/unit/merge.test)