Notice that I’ve highlighted the nodes of the loop in both graphs in order to make it easier to understand it.
It should be clear by now that if,in the original flowgraph of the function, there’s a link from one of the node to one of its dominators it means that we have a loop.
I told you that this blog post will make use of REIL at some point, so there we go. Take for instance the x86 instruction set, it contains some rather nasty instructions which are “implicit” loops(for instance rep movs). What happens when a rep movs is translated to REIL? the loop is “unfolded”, so in comparison to a normal function flowgraph we are able to determine “implicit” loops without any trickeries.
Let’s stop talking now and see some code:
#the root node is the first node in the graph
dominatorTree = GraphAlgorithms.getDominatorTree(reilGraph, self.findRoot(reilGraph.getNodes()))
Yeah that’s cool, BinNavi is already capable of generating the dominator tree (using the Lengauer Tarjan algorithm so that you know).
self.fillDominatingSets(dominatorTree.getRootNode(), dominatingSets)
#Here we create the dominators list.
def fillDominatingSets(self, currNode, resultDict, currentSet=None ):
"""For each node store a set of dominating nodes in a dictionary,
thus each set contains the parents of a node. NOTE: this function walks the tree recursively"""
if currentSet == None:
currentSet = sets.Set()
#add itself
currentSet.add(currNode.getObject().getAddress())
resultDict[currNode.getObject().getAddress()] = currentSet
for node in currNode.getChildren():
newNodeSet = sets.Set()
#store parent's set in the child one
newNodeSet.update(currentSet)
self.fillDominatingSets(node, resultDict, newNodeSet)
return resultDict
Next step is to find backedges.
for node in reilGraph.getNodes():
currAddr = node.getAddress()
#get all children
childrenAddr = [child.getAddress() for child in node.getChildren()]
for address in childrenAddr:
#one of the children is dominated by the parent
if address in dominatingSets[currAddr]:
parentNode = [n for n in reilGraph.getNodes() if n.getAddress() == currAddr][0]
childNode = [n for n in reilGraph.getNodes() if n.getAddress() == address][0]
#get all parents
parentSet = get_all_parents( parentNode, childNode )
#get all children
childrenSet = get_all_children( childNode, parentNode)
#the nodes in the loop are the ones that are contained in both children and parents
results.append((f, parentSet.intersection(childrenSet)))
I’d like to thank
Halvar Flake for providing me with part of this code from a previous implementation of the this algo he wrote.
So what now? We have all the loops in a binary but we’d like to have only the interesting ones to analyze by hand.
First off let’s give a definition of “interesting” in this case, we consider a loop to be interesting when it writes some memory at different locations(yeah I know this is not the most interesting definition of “interesting” vulnerability analysis-wise).
For this purpose we will use a reaching definitions algorithm.
This is not the best and most comprehensive way of finding interesting loops but is by far the simplest and it’s also quite effective for explaining MonoREIL.
Reaching definitions is a typical dataflow analysis algorithm used to determine at a given program point p which definitions/assignments can reach p.
To avoid making this post longer than it already is I won’t explain how the algorithm works, but instead will point you to
wikipedia. Note though that without reading the explaination the rest of the post will be pretty much meaningless to you. So again,
wikipedia!
The easiest way to implement this algorithm is to use
abstract interpretation (and for those of you who prefer something less formal here’s another
very useful link) on an intermediate language.
We have both:
MonoREIL and
REIL.
So what do we need in order to write our MonoREIL algorithm? the flowgraph of a function, a way to walk the graph, a lattice and a way to combine lattice elements, the effect that each REIL instruction has on the lattice and a initial state. Quite a few things..
So let’s go back to the reaching definitions algorithm. The initial state is empty at each program state.
# This function creates the initial state of the state vector passed to the
# monotone framework. In the beginning the state of each instruction is defined
# as empty.
def generateStartVector(graph):
startVector = DefaultStateVector()
for node in graph:
element = SkeletonLatticeElement()
startVector.setState(node, element)
return startVector
As described in the wikipedia article the algorithm needs a Reach_in and a Reach_out set. In order to be able to determine when the iteration reaches its fixed point we need a way to know when two elements are equal. Also MonoREIL can only work if the lattice satisfies the
Ascending chain condition, therefore we need a way to determine whether an element is less than another one. Before showing you some code let me say a few words on lattices, as usual for a formal definition go read
here. Why do we use lattices in our code? Well abstract interpretation is all about taking some concrete values that are likely to be present in one of the execution paths of the code and then put all the execution paths together to gather a “set” of abstract values(that therefore are “path” independent). (yeah I suck a bit at explaining stuff). Lattices, are partially ordered sets that can hold all the concrete values and can give us a way to “merge” them, in fact each lattice has two operators:
and 
Which can be seen as “union” and “intersection”, hence we are able by the mean of those operators to merge different elements in one. Of course the reason for using lattices is not just that , but then we would get too mathy. All that said what a lattice is and what “union” and “intersection” are code-wise highly depends on the algorithm you’re shaping(eg: there’s not a template for a lattice that we can use each time). Let’s now take a loop at a lattice element in our algorithm:
# This class is used for the elements of the lattice. Each lattice element
# is used to keep track of the known state for a REIL instruction during
# analysis. Since this plugin keeps track of reached variables, the kept
# state says what definitions are present at the beginning and at the end of state.
class SkeletonLatticeElement(ILatticeElement):
def __init__(self):
self.inVal = Set()
self.out = Set()
def equals(self, rhs):
# This function helps MonoREIL to end the fixed point iteration
return self.out == rhs.out
def lessThan(self, rhs):
# This function helps MonoREIL to check the monotonous requirement.
return self.inVal < rhs.inVal and self.out < rhs.out
Now we need a way to combine lattices, for that we need to refer to the algorithm:

So in essence inVal is determined by the union of the out states of the parents states:
# This class defines the lattice used by the monotone framework. Its only
# purpose is to defined a function that is used to combine a list of states
# into one state.
class SkeletonLattice(ILattice):
def combine(self, states):
combinedState = SkeletonLatticeElement()
for state in states:
combinedState.inVal = combinedState.inVal.union(state.element.out)
return combinedState
The next question we need to ask ourserlves is: how does a REIL instruction influence a given state?
# This class provides the transformations each instruction has on a state. For
# each instruction of the instruction graph, the current state of the instruction
# and the combined state of the influencing nodes is passed to the function.
# The function returns the state of the instruction while considering the input
# states.
class SkeletonTransformationProvider(ITransformationProvider):
def transform(self, node, currentState, influencingState):
#assignments made by this node
gen = Set()
#assignments killed by this node
kill = Set()
transformed_state = SkeletonLatticeElement()
#inVal is equal to the combinedState inVal we computed in combine()
transformed_state.inVal.update(influencingState.inVal)
instruction = node.getInstruction()
#the third operand is always the defined one except from jumps and comparisons instructions
if instruction.thirdOperand.value != '' and instruction.mnemonic not in ("jcc", "bisz"):
#defs is the set that contains all the definitions made in the code. This method is called more than once so avoid duplicates
if (instruction.thirdOperand.value, instruction.getAddress()) not in defs:
defs.add((instruction.thirdOperand.value, instruction.getAddress()))
#a new definition was made in the node
gen.add((instruction.thirdOperand.value, instruction.getAddress()))
#kill is defined as: defs - gen
for (operand, address) in defs:
if address != instruction.getAddress() and operand == instruction.thirdOperand.value:
kill.add((operand, address))
#out is defined as: gen U (inVal - kill)
transformed_state.out.update(gen.union(
transformed_state.inVal.difference(kill)))
return transformed_state
Finally let’s glue the code together:
def doAnalysis(self):
# Generally the monotone framework works on graphs where each node represents
# a REIL instruction. For this reason there is a helper function that creates
# this instruction graph from a REIL graph.
self.instGraph = InstructionGraph.create(self.reilGraph.graph)
# Define the lattice used by the monotone framework.
lattice = SkeletonLattice()
# Generate the initial state vector.
startVector = generateStartVector(self.instGraph)
# Define the transformations used by the monotone framework.
transformationProvider = SkeletonTransformationProvider()
# Reaching definitions starts at the beginning of a function and moves
# downwards, so we use the default DownWalker class to move through
# the graph.
walker = DownWalker()
# Use the monotone framework to find what registers are defined by the current function.
solver = MonotoneSolver(self.instGraph, lattice, startVector, transformationProvider, walker)
results = solver.solve()
return results
How does this help us with out initial problem?
In a loop we are interested in the store to memory REIL instruction “stm”, so if the last operand wasn’t present in the reach_in state it means that either is a literal (fixed memory location) or it’s a register whose value was inherited from another function (but hey interprocedural analysis is *hard*). If instead the last operand is present the stm is likely to be writing memory to a variable location hence the loop is interesting for us.
results = self.doAnalysis()
for node in self.instGraph:
if node.getInstruction().getAddress() not in self.addresses:
continue
for (operand, address) in results.getState(node).inVal:
if node.getInstruction().thirdOperand == operand:
return False
return True