Metadata-Version: 2.4
Name: python-control-flow
Version: 1.0.0
Summary: Python Bytecode Control Flow Toolkit
Author-email: Rocky Bernstein <rb@dustyfeet.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/rocky/python-control-flow
Project-URL: Downloads, https://github.com/rocky/python-control-flow/releases
Keywords: Python bytecode,bytecode,disassembler
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Code Generators
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: click
Requires-Dist: xdis<6.3.0,>=6.1.1
Provides-Extra: dev
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Dynamic: license-file

|Combined CI status| |Supported Python Versions|

Introduction
------------

This is a Toolkit for getting control flow information from Python bytecode.

Specifically:

* Creates basic blocks from Python bytecode.
* Creates a control-flow graph from the basic blocks.
* Creates dominator trees and dominator regions for the control flow.
* Graphs via `dot <https://graphviz.org/>`_ the control-flow graph and dominator tree.


I've used some routines from Romain Gaucher's `equip <https://github.com/neuroo/equip>`_ as a starting point.

*This code is alpha.*
There may be some bugs in the code. And, right now, we always produce dot graphs, which can be a problem with large bytecode. Inserting pseudo bytecode instructions was designed to be used with the newer grammar-based Python decompiler project. That code may change a bit.

Example
-------

For now, the Python in ``test/test_bb2.py`` shows what's up the best.

Consider this simple Python program taken from my `BlackHat Asia 2024 talk <https://www.blackhat.com/asia-24/briefings/schedule/index.html#how-to-get-the-most-out-of-the-python-decompilers-uncompyle-and-decompyle---how-to-write-and-read-a-bytecode-decompiler-37789>`_:

.. code-block:: python

    # Program to count the number of bits in the integer 6.
    i: int = 6
    zero_bits = 0
    one_bits = 0
    while i > 0:  # loop point
       # loop alternative
       if i % 0:
           # first alternative
           one_bits += 1
       else:
           # second alternative
           zero_bits += 1
       # join point
       i << 1
    # loop-end join point

You can find this byte-compiled to Python 3.8 bytecode in `doc-example/count-bits.cpython-38.pyc <https://github.com/rocky/python-control-flow/blob/post-dominator-refactor/doc-example/count-bits.cpython-38.pyc>`_.
We can get control flow information for this program using::

  python ./test/test-bb2.py doc-example/count-bits.cpython-38.pyc

After running, in ``/tmp`` you'll find some ``.dot`` files and some ``.png`` images generated for the main routine.

``flow-3.8--count-bits.cpython-38--module.png`` is a PNG image for the control flow.

.. image:: https://github.com/rocky/python-control-flow/blob/master/doc-example/flow-3.8-count-bits.cpython-38--module.png

Here is what the colors on the arrows indicate:

red
    The first alternative of a group of two alternatives

blue
    The second alternative of a group of two alternatives

green
     A looping (backward) jump

Here is what the line styles on the arrows indicate:

solid
     an unconditional (and forward) jump

dashed
     This should always be shown as a straight line centered from one block on
     top to the next block below it. It is the block that follows in
     the bytecode sequentially. If there is an arrowhead, there is a
     fall-through path from the upper block to the lower block. If there is no
     arrowhead, then either the last instruction of the upper basic block is an unconditional jump or this instruction is a return
     instruction or an explicit exception-raising instruction.

dotted
     The jump path of a conditional jump. This is usually curved
     and appears on the side of a box.


We align blocks linearly using the offset addresses. You can find
the offset ranges listed inside the block. The entry block is
marked with an additional border. We also show the basic block number
and block flags.

Any block that is ghost-like or has a white-background box with a dashed border is dead code.

Control-Flow with Dominator Regions
+++++++++++++++++++++++++++++++++++

In addition to the basic control flow, we mark and color boxes with dominator regions.

.. image:: https://github.com/rocky/python-control-flow/blob/master/doc-example/flow%2Bdom-3.8-count-bits.cpython-38--module.png

Regions with the same nesting level have the same color. So Basic blocks 3 and 7 are at the same nesting level. Blocks 4 and 5 are at the same nesting level and color.

Block 6 has two jumps into the block, so it is neither "inside" nor blocks 4 or 5. Block 6 is the "join point" block after an if/else::

   # block 3
   if i % 0:
       # block 4
       one_bits += 1
   else:
       # block 5
       zero_bits += 1
   # join point
   i << 1  # This is block 6

The collection of blocks, 4, 5, and 6, are all dominated by the block region head Block 3, which has a border around it to show it is the head of a block region.

A border is put around a block *only* if it dominates some *other* block. So while technically block 4 dominates itself, and block 5 dominates itself, that fact is not interesting.


Colors get darker as the region is more nested.


In addition, if a jump or fallthrough jumps out of its dominator region,
the arrowhead of the jump is shown in brown. Note that a jump arrow
from an "if"-like statement or "for"-like to its end will not be in
brown. Only the "fallthrough" arrow will be in brown. This is why the
arrowhead of the jump from block to block 7 is blue, not brown.

If any basic block is jumped to using a jump-out (or end scope) kind of edge, then the box has a brown outline.

Inside the block text, we now add the dominator region number for a block in parentheses. For example, Basic blocks 4 and 5 are in dominator region 3, and so are marked "(3)" after their basic block number. The dominator number for a basic block is the same as its basic block number. So Basic Block 3 is also Dominator Region 3.

Note that even though basic blocks 4 and 5 are at the same indentation level, they are in different *scopes* under basic block 3.

In this example, all conditional jumps were taken if the condition was false. When the condition is true, we bold the dotted blue arrow. By doing this and by showing whether the jump condition is true or false, you can see in the control flow whether the source text contains an "and" type of condition or an "or" type of condition.

Here is the graph for ``x and y``:

.. image:: https://github.com/rocky/python-control-flow/blob/master/doc-example/flow%2Bdom-3.9-and-lambda%3Ax-y.png

Note the same graph would be the same as ``if x: if y: ...```.

The graph for ``a or b`` is almost the same except the style of the blue dotted arrow:

.. image:: https://github.com/rocky/python-control-flow/blob/master/doc-example/flow%2Bdom-3.9-or-lambda%3Aa-b.png

.. |Combined CI status| image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/rocky/python-control-flow/master/.github/combined-ci-status.json
.. |Supported Python Versions| image:: https://img.shields.io/pypi/pyversions/python-control-flow.svg
