No matter how many times I read Michal’s post I still don’t get it

Michal Zalewski wrote a post about the responsibilities a security researcher has when it comes to vulnerability discovery.
Now I’m not much of a bug hunter myself nor I do this for a living so I can’t say for sure how much time it’s needed to find such bugs. I will instead try to reason on some numbers provided by Dave Aitel from Immunitysec. On dailydave he wrote that it takes about 3 person-months to have a reliable exploit on Windows. A less reliable exploit and for more OSes than just Windows would take on average 1 person-month.
The reasoning behind Michal’s post is that you should give away bugs for free to obtain:
1) peer recognition
2) interesting job offers
3) public shame on negligent(security-wise) companies

Now I would say that to get the first two of them and stay relevant in the community it takes about 3-4 exploits per year(this is by no means the “perfect guide to infosec fame” but I guess that’s a good assessment of what it takes to obtain what Michal said).

So to summarize what you need to do is to spend 4 months of your life each year to: make sure your peers like you, make sure big companies making billions per year are mocked in public and finally have interesting job offers that you probably don’t care about. Not only you can obtain all three of them in thousands of other ways but also as you can easily understand none of those pay your bills. So what we are left with are:

1) Rich people that in their spare time decide to make a name in the infosec industry
2) Skilled people who are looking for a new job (notice though that after they obtain it they will stop doing such research)
3) Criminals and people employed by the aforementioned companies (notice in that case that neither of them has the slightest interest in making the information public to everyone)

In the end this looks to me like a typical “Quis custodiet ipsos custodes?” situation in which everyone has to blindly trust either random and not very motivated people (1 and 2) or companies who will publically ashame themselves by telling the world how much they suck when it comes to security.
Somebody may argue that this is exactly what happens in real life with physical security, eg: law enforcement and everything related to this. There is one objection to that though: you get to vote your president, you don’t get to vote the CEO of a random software company.
At the end of the day you need to chose the least evil and if it is having companies like iDefense, TippingPoint and so forth pay researchers to then apply “responsible disclosure” or convince companies to pay people to have their bugs I’d say that this is definitely the least evil we can wish for.. and I’m not even sure that this is “evil” at all.

News

Hey my dear readers,
It’s been a while. I was caught by various stuff including holidays and material for Black Hat but I’m back with some good news!
Zynamics now has a blog, the whole team will be writing about RE, our products and other interesting stuff! Speaking of which here’s my first blog post there. Given the limited about of time I have the new policy for blog posting will be: for technical stuff I’ll write first on one of the two blogs and after a week or so it will be posted on the other one. For personal stuff/rants I’ll keep using exclusively this blog.
Bottom line make sure to keep an eye on both blogs!

Let’s move on the second news. I’ll be speaking at BH DC and after that I’ll be staying a few days in NYC, if you want to chat with me about RE, vuln-dev, zynamics products or just offer me a beer send me an email!

That’s all for now.
Cheers,
Vincenzo

Finding interesting loops using (Mono)REIL

Natural loops detection is a well understood and useful to solve problem for vulnerability analysis.
Peter Silberman wrote a while ago a very interesting article on the subject, I strongly suggest you to take a look at it.
In this blog post we will see how to detect natural loops(from now on just “loops”) and how to have a metric to decide whether or not a given loop is interesting using REIL and MonoREIL.
The algorithm used for detecting loops is pretty straightforward, first off we need to build a dominator tree out of a function, then for each node we calculate the list of dominators, finally for each dominated node we check whether there’s a backedge to one of the dominators; if a backedge is found then we have a loop. This might be a little confusing at first, so let’s proceed step by step.
A dominator tree is a tree where leafs are dominated by their parents. Or better a node d dominates node k if the only way to go from the start node to k it’s through d. As usual you better check wikipedia for a formal and possibly more meaningful definition.
So for instance if we have a function like the one shown in this picture:

The dominator tree would look like this:

Notice that I’ve highlighted the nodes of the loop in both graphs in order to make it easier to understand it.
Now we need to find the dominators of each node, for instance the dominators of the node in red are the ones in green in the following picture:
It should be clear by now that if,in the original flowgraph of the function, there’s a link from one of the node to one of its dominators it means that we have a loop.
I told you that this blog post will make use of REIL at some point, so there we go. Take for instance the x86 instruction set, it contains some rather nasty instructions which are “implicit” loops(for instance rep movs). What happens when a rep movs is translated to REIL? the loop is “unfolded”, so in comparison to a normal function flowgraph we are able to determine “implicit” loops without any trickeries.
Let’s stop talking now and see some code:

#the root node is the first node in the graph
dominatorTree = GraphAlgorithms.getDominatorTree(reilGraph, self.findRoot(reilGraph.getNodes()))

Yeah that’s cool, BinNavi is already capable of generating the dominator tree (using the Lengauer Tarjan algorithm so that you know).
self.fillDominatingSets(dominatorTree.getRootNode(), dominatingSets)

    #Here we create the dominators list.
    def fillDominatingSets(self, currNode, resultDict, currentSet=None ):
        """For each node store a set of dominating nodes in a dictionary,
            thus each set contains the parents of a node. NOTE: this function walks the tree recursively"""
        if currentSet == None:
            currentSet = sets.Set()
        #add itself
        currentSet.add(currNode.getObject().getAddress())
        resultDict[currNode.getObject().getAddress()] = currentSet
        for node in currNode.getChildren():
            newNodeSet = sets.Set()
            #store parent's set in the child one
            newNodeSet.update(currentSet)
            self.fillDominatingSets(node, resultDict, newNodeSet)

        return resultDict

Next step is to find backedges.
	
    for node in reilGraph.getNodes():
            currAddr = node.getAddress()
	    #get all children
            childrenAddr = [child.getAddress() for child in node.getChildren()]
            for address in childrenAddr:
                #one of the children is dominated by the parent
                if address in dominatingSets[currAddr]:
                    parentNode = [n for n in reilGraph.getNodes() if n.getAddress() == currAddr][0]
                    childNode = [n for n in reilGraph.getNodes() if n.getAddress() == address][0]
                    #get all parents
                    parentSet = get_all_parents( parentNode, childNode )
                    #get all children
                    childrenSet = get_all_children( childNode, parentNode)
                    #the nodes in the loop are the ones that are contained in both children and parents
                    results.append((f, parentSet.intersection(childrenSet)))

I’d like to thank Halvar Flake for providing me with part of this code from a previous implementation of the this algo he wrote.
So what now? We have all the loops in a binary but we’d like to have only the interesting ones to analyze by hand.
First off let’s give a definition of “interesting” in this case, we consider a loop to be interesting when it writes some memory at different locations(yeah I know this is not the most interesting definition of “interesting” vulnerability analysis-wise).
For this purpose we will use a reaching definitions algorithm.
This is not the best and most comprehensive way of finding interesting loops but is by far the simplest and it’s also quite effective for explaining MonoREIL.
Reaching definitions is a typical dataflow analysis algorithm used to determine at a given program point p which definitions/assignments can reach p.
To avoid making this post longer than it already is I won’t explain how the algorithm works, but instead will point you to wikipedia. Note though that without reading the explaination the rest of the post will be pretty much meaningless to you. So again, wikipedia!
The easiest way to implement this algorithm is to use abstract interpretation (and for those of you who prefer something less formal here’s another very useful link) on an intermediate language.
We have both: MonoREIL and REIL.
So what do we need in order to write our MonoREIL algorithm? the flowgraph of a function, a way to walk the graph, a lattice and a way to combine lattice elements, the effect that each REIL instruction has on the lattice and a initial state. Quite a few things..
So let’s go back to the reaching definitions algorithm. The initial state is empty at each program state.
# This function creates the initial state of the state vector passed to the
# monotone framework. In the beginning the state of each instruction is defined
# as empty.
def generateStartVector(graph):
    startVector = DefaultStateVector()

    for node in graph:
        element = SkeletonLatticeElement()
        startVector.setState(node, element)

    return startVector

As described in the wikipedia article the algorithm needs a Reach_in and a Reach_out set. In order to be able to determine when the iteration reaches its fixed point we need a way to know when two elements are equal. Also MonoREIL can only work if the lattice satisfies the Ascending chain condition, therefore we need a way to determine whether an element is less than another one. Before showing you some code let me say a few words on lattices, as usual for a formal definition go read here. Why do we use lattices in  our code? Well abstract interpretation is all about taking some concrete values that are likely to be present in one of the execution paths of the code and then put all the execution paths together to gather a “set” of abstract values(that therefore are “path” independent). (yeah I suck a bit at explaining stuff). Lattices,  are partially ordered sets that can hold all the concrete values and can give us a way to “merge” them, in fact each lattice has two operators:

and

Which can be seen as “union” and “intersection”, hence we are able by the mean of those operators to merge different elements in one. Of course the reason for using lattices is not just that , but then we would get too mathy.  All that said what a lattice is and what “union” and “intersection” are  code-wise highly depends on the algorithm you’re shaping(eg: there’s not a template for a lattice that we can use each time). Let’s now take a loop at a lattice element in our algorithm:

# This class is used for the elements of the lattice. Each lattice element
# is used to keep track of the known state for a REIL instruction during
# analysis. Since this plugin keeps track of reached variables, the kept
# state says what definitions are present at the beginning and at the end of state.
class SkeletonLatticeElement(ILatticeElement):
    def __init__(self):
        self.inVal  = Set()
        self.out = Set()

    def equals(self, rhs):
        #  This function helps MonoREIL to end the fixed point iteration
        return self.out == rhs.out

    def lessThan(self, rhs):
        # This function helps MonoREIL to check the monotonous requirement.
        return self.inVal < rhs.inVal and self.out < rhs.out

Now we need a way to combine lattices, for that we need to refer to the algorithm:

So in essence inVal is determined by the union of the out states of the parents states:

# This class defines the lattice used by the monotone framework. Its only
# purpose is to defined a function that is used to combine a list of states
# into one state.
class SkeletonLattice(ILattice):
    def combine(self, states):
        combinedState = SkeletonLatticeElement()
        for state in states:
            combinedState.inVal = combinedState.inVal.union(state.element.out)

        return combinedState

The next question we need to ask ourserlves is: how does a REIL instruction influence a given state?
# This class provides the transformations each instruction has on a state. For
# each instruction of the instruction graph, the current state of the instruction
# and the combined state of the influencing nodes is passed to the function.
# The function returns the state of the instruction while considering the input
# states.
class SkeletonTransformationProvider(ITransformationProvider):
    def transform(self, node, currentState, influencingState):

        #assignments made by this node
        gen = Set()
        #assignments killed by this node
        kill = Set()
        transformed_state = SkeletonLatticeElement()
        #inVal is equal to the combinedState inVal we computed in combine()
        transformed_state.inVal.update(influencingState.inVal)

        instruction = node.getInstruction()
        #the third operand is always the defined one except from jumps and comparisons instructions
        if instruction.thirdOperand.value != '' and instruction.mnemonic not in ("jcc", "bisz"):
            #defs is the set that contains all the definitions made in the code. This method is called more than once so avoid duplicates
            if (instruction.thirdOperand.value, instruction.getAddress()) not in defs:
                defs.add((instruction.thirdOperand.value, instruction.getAddress()))
            #a new definition was made in the node
            gen.add((instruction.thirdOperand.value, instruction.getAddress()))

        #kill is defined as: defs - gen
        for (operand, address) in defs:
            if address != instruction.getAddress() and operand == instruction.thirdOperand.value:
                kill.add((operand, address))

        #out is defined as: gen U (inVal - kill)
        transformed_state.out.update(gen.union(
           transformed_state.inVal.difference(kill)))

        return transformed_state

Finally let’s glue the code together:
    def doAnalysis(self): 

        # Generally the monotone framework works on graphs where each node represents
        # a REIL instruction. For this reason there is a helper function that creates
        # this instruction graph from a REIL graph.
        self.instGraph = InstructionGraph.create(self.reilGraph.graph)

        # Define the lattice used by the monotone framework.
        lattice = SkeletonLattice()

        # Generate the initial state vector.
        startVector = generateStartVector(self.instGraph)

        # Define the transformations used by the monotone framework.
        transformationProvider = SkeletonTransformationProvider()

        # Reaching definitions starts at the beginning of a function and moves
        # downwards, so we use the default DownWalker class to move through
        # the graph.
        walker = DownWalker()

        # Use the monotone framework to find what registers are defined by the current function.
        solver = MonotoneSolver(self.instGraph, lattice, startVector, transformationProvider, walker)
        results = solver.solve()

        return results

How does this help us with out initial problem?
In a loop we are interested in the store to memory REIL instruction “stm”, so if the last operand wasn’t present in the reach_in state it means that either is a literal (fixed memory location) or it’s a register whose value was inherited from another function (but hey interprocedural analysis is *hard*). If instead the last operand is present the stm is likely to be writing memory to a variable location hence the loop is interesting for us.
 	results = self.doAnalysis()

        for node in self.instGraph:
            if node.getInstruction().getAddress() not in self.addresses:
                continue
            for (operand, address) in results.getState(node).inVal:
                if node.getInstruction().thirdOperand == operand:
                    return False

           return True

Scripting with BinNavi – Cyclomatic Complexity

My collegue Sebastian announced a while ago that from version 2.1 of BinNavi it is possible to run BinNavi scripts without BinNavi itself. Let’s see an example on how to write a quick script using jython.
In our dummy example we will calculate the cyclomatic complexity of each function of a given binary.
To keep it short and simple cyclomatic complexity is a metric used to see how complex in terms of code paths a piece of code is; for a more detailed and precise description you better look at wikipedia.
This metric can be quite useful for a number of tasks, for example to have an idea of where bugs may be(yeah ok, it’s debatable but let’s not start flames in the first post).
First off we need to include a few files:

sys.path.append(os.path.join(os.path.dirname(__file__), "BinNavi.jar"))
sys.path.append(os.path.join(os.path.dirname(__file__), "REIL.jar"))
sys.path.append(os.path.join(os.path.dirname(__file__), "mysql-connector-java-5.1.8-bin.jar"))

Then, in order to make the script running standalone we need to use this line:

binNaviProxy = StandAlone.getPluginInterface()

BinNavi relies on a database to store information on binaries, so before being able to do any interesting work we first need to locate the binary we are interested in.

Just as a quit recap for those of you not familiar with BinNavi functions are stored inside “modules” which can optionally be stored inside “projects”.

So let’s see how to locate functions of interest:

class BinNaviDatabaseInterface(object):
    def __init__(self, name, driver, url, user, passwd, isProject = True): 

        dbm = binNaviProxy.databaseManager
        dbm.addDatabase("", driver, url, user, passwd, False, False)

        self.database = dbm.databases[0]
        self.database.connect()
        self.database.load()

        if isProject:
            self.projectName = name
            self.moduleName = None
        else:
            self.moduleName = name
            self.projectName = None

        self.isProject = isProject

    def retrieveFunctions(self):
        mods = []
        functions = []

        if self.isProject is True:
            self.findProject()
            if self.project is None:
                return None
            mods.extend(self.getProjectModules())
        else:
            self.findModule()
            if self.module is None:
                return None
            mods.append(self.module)

        for module in mods:
            if module.isLoaded() is False:
                module.load()
            functions.extend(module.getFunctions())

        return functions

I won’t bother you with details on the code and on how to find modules or projects(also because it’s pretty ease to understand).

Now let’s move to the interesting bit, let’s calculate cyclomatic complexity:

class BinNaviComputeCC(BinNaviControl.BinNaviOps): 

    def getCC(self, start_addr= 0, end_addr= 0xffffffff):
        #exclude functions that don't belong to the given range
        functions = []
        complexities = []

        fn = self.naviDB.retrieveFunctions() 

        if fn is None:
        	return None

        for f in fn:
            if start_addr <= f.getAddress().toLong() <= end_addr:
                functions.append(f)

        print "Total functions found" , len(functions)
        counter = 0
        for f in functions:
            counter +=1
            #the cyclomatic complexity is calculated as follow: CC = Edges - Nodes + 2
            complexities.append((f, f.getEdgeCount() - f.getBlockCount() + 2))
            print "Function %s done.. %d left\n function edges: %d\n function nodes: %d\n" % (f.getName(), len(functions) - counter, f.getEdgeCount(), f.getBlockCount())
            if f.isLoaded():
                f.close()

        if len(complexities) is 0:
            return None

        #sort by cyclomatic complexity value
        complexities.sort(key=operator.itemgetter(1), reverse=True)

        return complexities
It’s as easy as it gets. First we find all the functions, then we filter them in order to select only the ones in a given address range, finally for each function we calculate the cyclomatic complexity (Number of Edges – Number of Nodes + 2).
Let’s run it:
nemesis:~/Codes/navicoverage snagg$jython naviCC.py "Flash Player" 127.0.0.1 binnavi2.2 naviuser navipassword false
Total functions found 13975
Function sub_1820 done.. 13974 left
 function edges: 0
 function nodes: 1
...
[cut]
nemesis:~/Codes/navicoverage snagg$head navi-cyclomatic
Function sub_567DE0 has value 1716
Function sub_538910 has value 1067
Function sub_114290 has value 1022
Function sub_4D3330 has value 933
Function sub_1E2720 has value 686
Function sub_55FC60 has value 619
Function sub_46F80 has value 568
Function sub_32B50 has value 565
Function sub_3201A0 has value 564
Function sub_508420 has value 539
And now let’s take a look inside BinNavi at sub_567de0:
It looks pretty complex indeed!

First post – Egomaniac

Here we are,

I used to have some sort of blog on this cool site once, unfortunately although it was pretty awesome and easy to use it’s definitely not the right place to paste code and stuff of this kind;  after a while it became more like a twitter mirror than a real blog. So what I did was pretty much to “cloudsource” my works/thoughts/code/whatever till it got to a point where I wasn’t even able to track down things anymore.

The idea behind this site is to collect all the aforementioned things and have a place to store them all. I also promise to try to keep this blog interesting(at least security-wise). Without wasting more time let’s kick this thing and see what comes out of the site.