MelVM: low-level covenant VM

MelVM: low-level covenant VM


Overview

MelVM is the low-level, stack-based virtual machine that powers Themelio covenants. By only allowing looping for a fixed number of iterations, MelVM makes the worst-case cost of each covenant computable, trading off Turing completeness. Unlike Bitcoin scripts, however, MelVM covenants can compute all primitive recursive functions, allowing expressing almost all “interesting” constructs.

Memory model

MelVM uses a Harvard architecture where the code itself is not part of the memory space. Memory is divided into two regions: the stack and the heap. Except when data is transferred between the stack and the heap, MelVM instructions directly operate only on stack data, pushing and popping items from the top of the stack.

A big difference from most low-level VMs is that the basic unit of data isn’t a word of binary data, but a recursive data structure:

pub enum Value {
    Int(U256),
    Bytes(im::Vector<u8>),
    Vector(im::Vector<Value>),
}

That is, there are two base types:

  • 256-bit unsigned integers
  • Arbitrary-length bytestrings

As well as a product type: an immutable vector.

The stack is just a FIFO stack of Values, while the heap is a mapping from 16-bit addresses to Values.

Execution model

MelVM takes as input:

/// Heap address where the transaction trying to spend the coin encumbered by this covenant (spender) is put
pub const ADDR_SPENDER_TX: u16 = 0;
/// Heap address where the spender's hash is put.
pub const ADDR_SPENDER_TXHASH: u16 = 1;
/// Heap address where the *parent* (the transaction that created the coin now getting spent)'s hash is put
pub const ADDR_PARENT_TXHASH: u16 = 2;
/// Heap address where the index, at the parent, of the coin being spent is put. For example, if we are spending the third output of some transaction, `Heap[ADDR_PARENT_INDEX] = 2`.
pub const ADDR_PARENT_INDEX: u16 = 3;
/// Heap address where the hash of the running covenant is put.
pub const ADDR_SELF_HASH: u16 = 4;
/// Heap address where the face value of the coin being spent is put.
pub const ADDR_PARENT_VALUE: u16 = 5;
/// Heap address where the denomination of the coin being spent is put.
pub const ADDR_PARENT_DENOM: u16 = 6;
/// Heap address where the additional data of the coin being spent is put.
pub const ADDR_PARENT_ADDITIONAL_DATA: u16 = 7;
/// Heap address where the height of the parent is put.
pub const ADDR_PARENT_HEIGHT: u16 = 7;
/// Heap address where the "spender index" is put. For example, if this coin is spent as the first input of the spender, then `Heap[ADDR_SPENDER_INDEX] = 0`.
pub const ADDR_SPENDER_INDEX: u16 = 8;
/// Heap address where the header of the last block is put. If the covenant is being evaluated for a transaction in block N, this is the header of block N-1.
pub const ADDR_LAST_HEADER: u16 = 9;

List of opcodes

In the “meaning” field, all expressions are evaluated left to right. For example, pop() - pop() means pop a value x, then another value y, and then compute x-y.

Arithmetic

Overflow always wraps. Whether or not the previous instruction overflowed can be queried.

Opcode Encoding Meaning
ADD 0x10 push(pop() + pop())
SUB 0x11 push(pop() - pop())
MUL 0x12 push(pop() * pop())
DIV 0x13 push(pop() / pop())
REM 0x14 push(pop() % pop())

Logic

All operators operate on integers and are bitwise.

Opcode Encoding Meaning
AND 0x20 push(pop() & pop())
OR 0x21 push(pop() | pop())
XOR 0x22 push(pop() ^ pop())
NOT 0x23 push(~pop())
EQL 0x24 push(pop() == pop())
LT 0x25 push(pop() < pop())
GT 0x26 push(pop() > pop())

Cryptography

Operators take in a bytestring and return a bytestring.

Opcode Encoding Meaning
HASH(n) 0x30 + u16be push(blake3(pop()[..n]))
SIGEOK(n) 0x32 + u16be push(ed25519_verify(msg = pop()[..n], pk = pop(), sig = pop()))

Heap access

Opcode Encoding Meaning
LOAD 0x40 push(heap[pop()])
STORE 0x41 heap[pop()] = pop()
LOADIMM(n) 0x42 + u16be push(heap[n])
STOREIMM(n) 0x43 + u16be heap[n] = pop()

Vectors

Despite their appearance, these operations, as well as those for bytestrings, are all quasi-constant-time because vectors and bytestrings can be represented as RRB trees or similar.

Opcode Encoding Meaning
VREF 0x50 push(pop()[pop()])
VAPPEND 0x51 push(pop() ++ pop())
VEMPTY 0x52 push(empty vector)
VLENGTH 0x53 push(pop().length)
VSLICE 0x54 push(pop[pop()..pop()])
VSET 0x55 push(pop[pop() => pop()])
VPUSH 0x56 push(pop().push(pop()))
VCONS 0x57 push(cons(pop(), pop()))

Bytestrings

Opcode Encoding Meaning
BREF 0x70 push(pop()[pop()])
BAPPEND 0x71 push(pop() ++ pop())
BEMPTY 0x72 push(empty vector)
BLENGTH 0x73 push(pop().length)
BSLICE 0x74 push(pop[pop()..pop()])
BSET 0x75 push(pop[pop() => pop()])
BPUSH 0x76 push(pop().push(pop()))
BCONS 0x77 push(cons(pop(), pop()))

Control flow

Opcode Encoding Meaning
JMP(n) 0xa0 Jump n instructions forward
BEZ(n) 0xa1 If pop() is zero, jump n
BNZ(n) 0xa2 If pop() isn’t zero, jump n
LOOP(n, body) 0xb0 + u16be + u16be Loop body n times

Type conversions

Opcode Encoding Meaning
ITOB 0xc0 Pops an integer and converts to bytes
BTOI 0xc1 Pops bytes and converts first 32B to integer
TYPEQ 0xc2 Pops a value and returns what type it is: 0 if integer, 1 if bytes, 2 if vector.

Literals

Opcode Encoding Meaning
PUSHB(bytes) 0xf0 + u8 length + bytes Push the bytes to the stack
PUSHI(int) 0xf1 + u256be Push the integer to the stack