Apr 4, 2019

Writing a weird turing machine in C, Part 2

I like weird machines. Bonus points if they are small, or fast, or extra weird.

In the previous post, we designed a weird machine summation function:

isize sum(isize a[], isize len) {
    isize *m = calloc(2, sizeof(isize));
    for(
        ;
        m[0] < len || (len = m[1])*0;
        m[1] += a[m[0]++]
    );
    free(m);
    return len;
}

Now, let’s try and turn this into a turing machine. Let’s make it interpret some invented binary instructions, since they would be far easier to implement than some concrete ISA with real specifications to worry about.

Let’s begin by writing some useful #DEFINE statements to make our mess of code readable. There are a few sections to add, so let’s go one by one:

#pragma once

// DEFS
#define u8 uint8_t
#define u16 uint16_t
#define u32 uint32_t
#define usize uintptr_t
#define false 0
#define true 1

Here we have the usual #pragma once, not that it’s needed in this single-file program, but it’s good practice to keep. After, we define some shorthand for types we’re going to be using a lot: one byte, two byte, four byte, and system pointer sizes, as well as boolean values (just to be explicit). In case you aren’t familiar with with it beforehand, these are the same type names that the Rust language uses. After come the word size definitions, followed by some nifty preprocessor functions to help us access our memory buffer a bit more seamlessly:

// Sizes in bytes
#define WORD_SIZE 2
#define DOUBLEWORD_SIZE 4
#define QUADWORD_SIZE 8

// Memory Accessors
#define m8(addr)    ( *((u8*)(m+addr)) )
#define m16(addr)   ( *((u16*)(m+addr)) )
#define m32(addr)   ( *((u32*)(m+addr)) )
#define msize(addr) ( *((usize*)(m+addr)) )

// Payload Accessors
#define BPC 0x00
#define BPC_OP pl[m32(BPC)]
#define BSR 0x01 * DOUBLEWORD_SIZE

💡

Our imaginary instruction set will keep the program counter at 0x00, and the status register at 0x04 (the second doubleword).

That’s a lot of #DEFINEs, but the preprocessor will doing us a favor by doing a lot of the heavy lifting for us. We also define a little macro to help us get the instruction stored at whatever location in the payload the program counter points to.

Now that we know what this part of our environment looks like, let’s set it up. We’ll set the dynamic memory we’ll be using for our machine to 0xFFFF bytes, which should me more than enough. We’ll also keep the call to calloc(), since we’d prefer to eschew the need to zero the memory ourselves. Because we can read and execute directly from the input buffer as if it were ROM, we get to use the whole memory buffer for ourselves.

// main.h
#define BSR 0x01*DOUBLEWORD_SIZE

// main.c
usize run(u32 *payload, usize p_len) {
    u8 *m = calloc(0xFFFF, sizeof(u8));
    for(
        ;
        ( !(m32[BSR] & 0x01) && /* Cleanup Block (must evaluate to false) */ );
        /* Inc Block */
    );
    free(m);
    return p_len;
}

You may notice I also chose to break the loop if the least significant bit of a special memory location is set - this will be our virtual machine’s internal state flag register and will be kept at the second doubleword in memory, immediately after the program counter. In this case, our least significant bit will be our DMP flag - when it is set, all execution will stop, cleanup will be run, and a pointer to internal memory will be returned instead of an error code. This may change later, but it gives us a good starting point to get memory dumps later on.

Finally, the type of payload has been changed to an array of u32s, since each instruction is encoded as a doubleword. Now, let’s write a basic cleanup block to run when DMP is set:

((p_len =  (usize)m)*0 || (u8)(m = malloc(0))*0)

Some odd computation is happening in our cleanup block - When we want to get a memory dump for inspection when DMP is set, we want to return a pointer to m. However, before we return it, we free m. What we do to get around that issue is simple: we set p_len to the address of m, then change m to point to a zero-length allocation, so that can be safely freed, and our actual memory buffer can be passed out. Since this is intended for crash dumps, I think it’s safe to let the calling code free m. We are also using a second lazy OR - this is to force a sequence point that we won’t get if we use plain old arithmetic instruction chains.

Before compiling, let’s add a home-made debug macro we can use to decode instructions and print them:

// run(payload, payload_length)
usize run(u32 *pl, usize p_len) {
    u8 *m = calloc(0xFF, sizeof(u8));
    for (
        ;
        ( dbg0 ) | (
            ( (m32(BSR) & 0x00000001) && ((p_len = (usize)m)*0 | (usize)(m = malloc(0)) * 0) )
        );
        /* Inc Block */
    );
    free(m);
    return p_len;
}

❓

What this macro does is outside the scope of this article – it's purpose is to pretty-print decoded instructions before they are executed. If we add #define DEBUG to the top of our source file, it will expand DBG0 and DBG1 to the debug display code. Otherwise, it will be expand to the values 0 and 1, respectively.

This finally compiles, and if you set m32[BSR] = 0x01 in the increment block, you will indeed see the loop end and return a pointer.

Now let's define some conventions that will benefit us in this scenario: namely the fact that all instructions and their operands will be encoded as 32 bit doublewords. This makes it much easier to decode and execute than your average variable-length instruction set.

Let’s implement some actual instruction interpretation logic in our weird machine, starting with some defines to help us mask out relevant information:

// MASKS
// Mask out ops and flags
#define OP_M(ins)   (ins & (u32)0xE0000000)
#define FLG_16(ins) (ins & (u32)0x10000000)
#define FLG_32(ins) (ins & (u32)0x08000000)
#define FLG_A(ins)  (ins & (u32)0x04000000)
#define FLG_B(ins)  (ins & (u32)0x02000000)
#define FLG_C(ins)  (ins & (u32)0x01000000)

// Get the individual byte values
#define BT0_M(ins)  ((ins & (u32)0xff000000) >> 24)
#define BT1_M(ins)  ((ins & (u32)0x00ff0000) >> 16)
#define BT2_M(ins)  ((ins & (u32)0x0000ff00) >> 8)
#define BT3_M(ins)  (ins & (u32)0x000000ff)

// Get the individual arguments
#define ARG_M(ins)  (ins & (u32)0x00ffffff)
#define ARG1_12_M(ins) ((ins & (u32)0x00fff000) >> 12)
#define ARG2_12_M(ins) (ins & (u32)0x00000fff)
#define ARG1_16_M(ins) ((ins & (u32)0x00ff0000) >> 16)
#define ARG2_16_M(ins) (ins & (u32)0x0000ffff)

// Memory Locations
#define BINS_MAXMEM 0xFF
#define BPC 0x00
#define BSR 0x01*DOUBLEWORD_SIZE

❗

Note that our BSR register and our BPC (which we included beforehand) are defined within this block. Be careful not to include them twice!

Here we have some C preprocessor functions to mask out the opcode, flags, whole argument and partial argument fields, along with some memory locations for stuff like the program counter and the status register. We are only implementing 12-bit and 16-bit modes, so we only need masks for the 8-bit, 12-bit, and 16-bit operands. Let’s go ahead and use them:

// run(payload, payload_length)
usize run(u32 *pl, usize p_len) {
    u8 *m = calloc(0xFFFF + 0xFF, sizeof(u8));
    for (
        ;
        ( dbg0 ) | (
            ( (m32(BSR) & 0x00000001) && ((p_len = (usize)m)*0 | (usize)(m = malloc(0)) * 0) ) ||
            ( m32(BPC) < p_len-1 )
        );
        (
            (/* MOV */ (OP_M(BPC_OP) == 0x00000000) && () ) ||
            (/* ADD */ (OP_M(BPC_OP) == 0x20000000) && () ) ||
            (/* SUB */ (OP_M(BPC_OP) == 0x40000000) && () ) ||
            (/* OR  */ (OP_M(BPC_OP) == 0x60000000) && () ) ||
            (/* NOT */ (OP_M(BPC_OP) == 0x80000000) && () ) ||
            (/* INT */ (OP_M(BPC_OP) == 0xa0000000) && () ) ||
            (/* CMP */ (OP_M(BPC_OP) == 0xc0000000) && () ) ||
            (/* JMP */ (OP_M(BPC_OP) == 0xe0000000) && () ) ||
            (m32(BSR) = m32(BSR) ^ 0x00000001) // Failure. Unknown Opcode, set DMP Flag.
            ) & ( m32(BPC) += 1 ))
        ;
    free(m);
    return p_len;
}

Let’s run through the changes from top to bottom:

First, we add a program counter conditional underneath the DMP flag check. This will simply cause execution to end if we reach the end of the instruction stream.
Next, in our INC block, we define some patterns to match on specific opcodes. The code in the parenthesis after the lazy AND would be executed only if the previous condition failed.
At the bottom, an unknown opcode handler sets the DMP flag if no opcode matches.
Finally, we add a non-boolean AND, which always runs, in order to increment the program counter after the whole cycle’s computation has finished.

All that’s left is to write the actual handler in the second parenthesis for each opcode:

// run(payload, payload_length)
usize run(u32 *pl, usize p_len) {
    u8 *m = calloc(0xFFFF + 0xFF, sizeof(u8));
    for (
        ;
        ( dbg0 ) | (
            ( !(m32(BSR) & 0x00000001) || ((p_len = (usize)m)*0 | (usize)(m = malloc(0)) * 0) ) &&
            ( m32(BPC) < p_len-1 )
        );
        (
            (/* MOV */ (OP_M(BPC_OP) == 0x00000000) && (
                /* MOV */  ( (!FLG16_M(BPC_OP) && !FLGC_M(BPC_OP)) && (m16(ARG1_12_M(BPC_OP)) = m16(ARG2_12_M(BPC_OP)))             +1) ||
                /* MMI */  ( (!FLG16_M(BPC_OP) &&  FLGC_M(BPC_OP)) && (m16(ARG1_12_M(BPC_OP)) = ARG2_12_M(BPC_OP))                  +1) ||
                /* MOVW */ ( ( FLG16_M(BPC_OP) && !FLGC_M(BPC_OP)) && (m16(m16(ARG1_12_M(BPC_OP))) = m16(m16(ARG2_12_M(BPC_OP))))   +1) ||
                /* MMIW */ ( ( FLG16_M(BPC_OP) &&  FLGC_M(BPC_OP)) && (m16(BT1_M(BPC_OP)) = ARG2_16_M(BPC_OP))                      +1)
            )) ||
            (/* ADD */ (OP_M(BPC_OP) == 0x20000000) && (
                /* ADD */  ( (!FLG16_M(BPC_OP) ) && (m16(ARG1_12_M(BPC_OP)) += m16(ARG2_12_M(BPC_OP)))                              +1) ||
                /* ADDW */ ( ( FLG16_M(BPC_OP) ) && (m16(m16(ARG1_12_M(BPC_OP))) += m16(m16(ARG2_12_M(BPC_OP))))                    +1)
            )) ||
            (/* SUB */ (OP_M(BPC_OP) == 0x40000000) && (
                /* SUB */  ( (!FLG16_M(BPC_OP) ) && (m16(ARG1_12_M(BPC_OP)) -= m16(ARG2_12_M(BPC_OP)))                              +1) ||
                /* SUBW */ ( ( FLG16_M(BPC_OP) ) && (m16(m16(ARG1_12_M(BPC_OP))) -= m16(m16(ARG2_12_M(BPC_OP))))                    +1)
            )) ||
            (/* OR  */ (OP_M(BPC_OP) == 0x60000000) && (
                /* OR */  ( (!FLG16_M(BPC_OP) ) && (m16(ARG1_12_M(BPC_OP)) = m16(ARG1_12_M(BPC_OP)) | m16(ARG2_12_M(BPC_OP)))                +1) ||
                /* ORW */ ( ( FLG16_M(BPC_OP) ) && (m16(m16(ARG1_12_M(BPC_OP))) = m16(m16(ARG1_12_M(BPC_OP))) | m16(m16(ARG2_12_M(BPC_OP)))) +1)
            )) ||
            (/* NOT */ (OP_M(BPC_OP) == 0x80000000) && (
                /* NOT */  ( (!FLG16_M(BPC_OP) ) && (m16(ARG1_12_M(BPC_OP)) = !m16(ARG1_12_M(BPC_OP)))                +1) ||
                /* NOTW */ ( ( FLG16_M(BPC_OP) ) && (m16(m16(ARG1_12_M(BPC_OP))) = !m16(m16(ARG1_12_M(BPC_OP))))      +1)
            )) ||
            (/* INT */ (OP_M(BPC_OP) == 0xa0000000) && (true)) ||
            (/* CMP */ (OP_M(BPC_OP) == 0xc0000000) && (true)) ||
            (/* JMP */ (OP_M(BPC_OP) == 0xe0000000) && (true)) ||
            (m32(BSR) = m32(BSR) ^ 0x00000001) // Failure. Unknown Opcode, set DMP Flag.
            ) & ( m32(BPC) += 1 )
        );
    free(m);
    return p_len;
}

I’ve left out the INT, CMP, and JMP handlers since they require code a bit different from the other instructions, however the following code (when assembled) actually runs!

MMIW 0xAA, 0xDEAD
MMIW 0xAC, 0xBEEF
MOV 0xC4, 0xAA
ADD 0xC4, 0xAC

; Terminate via DMP
MMIW 0x04, 0x0001

Assembled:

11aadead11acbeef000c40aa200c40ac11040001600ff0ff

The last instruction sets the DMP flag, which triggers our exit routine, and the run function will return a pointer to our memory. If we inspect the memory, we will indeed find the values in the right places!

0x2a: dead
0x2b: beef
0x2c: 0
0x2d: 0
0x2e: 0
0x2f: 0
0x30: 0
0x31: 9d9c

This is the end for this simple POC. The rest of the virtual machine's internals fall outside the scope of this project.

As a bonus at the end, let’s take a look at the preprocessor output of the code so far, it’s some really tangled spaghetti:

usize run(u32 *pl, usize p_len) {
    u8 *m = calloc(0xFFFF + 0xFF, sizeof(u8));
    for (
        u32 stackPointer, stackDest = 0x0000000;
        ( debug_print(0, *((u32*)(pl + ( *((u32*)(m+0x00)) ))), ( *((u32*)(m+0x00)) )) ) | (
            ( !(( *((u32*)(m+0x01*4)) ) & 0x00000001) || ((p_len = (usize)m)*0 | (usize)(m = malloc(0)) * 0) ) &&
            ( ( *((u32*)(m+0x00)) ) < p_len-1 )
        );
        (
            ( ((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0xE0000000) == 0x00000000) && (
                           ( (!(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) && !(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x01000000)) && (( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ) = ( *((u16*)(m+(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00000fff))) )) +1) ||
                           ( (!(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) && (*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x01000000)) && (( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ) = (*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00000fff)) +1) ||
                           ( ( (*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) && !(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x01000000)) && (( *((u16*)(m+( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ))) ) = ( *((u16*)(m+( *((u16*)(m+(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00000fff))) ))) )) +1) ||
                           ( ( (*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) && (*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x01000000)) && (( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00ff0000) >> 16))) ) = (*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x0000ffff)) +1)
            )) ||
            ( ((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0xE0000000) == 0x20000000) && (
                           ( (!(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) ) && (( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ) += ( *((u16*)(m+(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00000fff))) )) +1) ||
                           ( ( (*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) ) && (( *((u16*)(m+( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ))) ) += ( *((u16*)(m+( *((u16*)(m+(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00000fff))) ))) )) +1)
            )) ||
            ( ((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0xE0000000) == 0x40000000) && (
                           ( (!(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) ) && (( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ) -= ( *((u16*)(m+(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00000fff))) )) +1) ||
                           ( ( (*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) ) && (( *((u16*)(m+( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ))) ) -= ( *((u16*)(m+( *((u16*)(m+(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00000fff))) ))) )) +1)
            )) ||
            ( ((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0xE0000000) == 0x60000000) && (
                          ( (!(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) ) && (( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ) = ( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ) | ( *((u16*)(m+(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00000fff))) )) +1) ||
                          ( ( (*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) ) && (( *((u16*)(m+( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ))) ) = ( *((u16*)(m+( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ))) ) | ( *((u16*)(m+( *((u16*)(m+(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00000fff))) ))) )) +1)
            )) ||
            ( ((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0xE0000000) == 0x80000000) && (
                           ( (!(*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) ) && (( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ) = !( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) )) +1) ||
                           ( ( (*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x10000000) ) && (( *((u16*)(m+( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ))) ) = !( *((u16*)(m+( *((u16*)(m+((*((u32*)(pl + ( *((u32*)(m+0x00)) ))) & (u32)0x00fff000) >> 12))) ))) )) +1)
            ))
            (( *((u32*)(m+0x01*4)) ) = ( *((u32*)(m+0x01*4)) ) | 0x00000001)
            ) & ( ( *((u32*)(m+0x00)) ) += 1 )
        );
    free(m);
    return p_len;
}

Subscribe to monty.sh