Assembly language

Introduction

Intel 80486 CPU courtesy Matt Britt at the English language Wikipedia, CC BY-SA 3.0 http://creativecommons.org/licenses/by-sa/3.0/, via Wikimedia Commons

We should know by this stage in the course that everything in the memory of a computer system is stored as binary digits. Text, numbers, images, and even the instructions for programs that have been written for that computer ultimately just get stored as ones or zeros.

If we knew the format of the instructions for a particular CPU, we could write a program for the CPU just using ones and zeros to represent each instruction to be carried out by the program. In fact this is exactly how programs were written in the early days of computing, using switches or wiring to enter the ones and zeros! This type of instruction that could be directly executed by the CPU is called a machine code instruction. Programs made up entirely from these instructions are known as machine code programs.

By the 1970s and 1980s when cheap “microcomputers” started to be mass produced, machine code programs could be entered using a keyboard and stored directly into RAM ready to be run.

An example of a trivial machine code program for a particular CPU called the Z80, would be the following 4 bytes which use the accumulator register to add together the values 3 and 5.

00111110 00000011
11000110 00000101

Clearly, writing machine code instructions directly in binary is difficult, time-consuming, and error prone. A slightly easier way of writing the program would be to use easy to remember commnands like “add” and that is exactly what you can do if using assembly language.

This assembly language program does exactly what the machine code one above does:

LD  A,3
ADD A,5

The characters LD and ADD represent what are known as mnemonics. A mnemonic is a technique for remembering something, examples include:

BIDMAS
Never Ever Support Wolves
How I wish I could calculate pi

In Z80 assembly language ADD A is a mnemonic for the machine code instruction 11000110, and LD A is a mnemonic for the machine code instruction 00111110.

The CPU does not understand the mnemonic, only the machine code instructions, so a translation program called an assembler is used to convert the assembly language program into machine code. The programmer can now write programs using mnemonics like LD A and ADD A, rather than looking up or trying to remember the machine code representation in binary.

The instructions are often in two parts: the operation or command to be performed is called the operation code (opcode), and the data or address it operates on, is called the operand. For the example above,

Opcode	Operand
LD A	3
ADD A	5

Z80 assembly language program to add two numbers

We commonly use the terms source code to describe the assembly language code before it is translated, and object code to describe the machine code that is generated by the assembler.

Note the assembler also lets us use decimal representation rather than binary, so when loading the accumulator with 3 in the first line of the assembly language program we wrote 3 not 00000011. Most assemblers let you use decimal, binary or hexadecimal representation as you prefer.

Little Man Computer

These days it is rare for programmers to write in machine code or assembly language, as high level languages allow you to write much more complex programs in a shorter amount of time. However understanding how assembly language programs work gives us an insight into how high level language programs and compilers work.

There’s no need to know any specific CPU instruction set for most exams, so educators have come up with a generic CPU that doesn’t really exist, and have called it Little Man Computer or LMC. The idea is that you can think of the CPU as containing a little man, running around fetching instructions from memory, decoding the instructions so it knows what to do, and then carrying out the instructions.

LMC helps us understand the Fetch-Decode-Execute (FDE) cycle, but also allows us to edit, assemble and run assembly language programs. You can think of it as a simulator and an IDE.

A very commonly used one, created by Peter Higginson, is used with 14-16 year old students to teach the concept of FDE. He has also created one that allows students who are studying for A Level Computer Science with the AQA exam board to write assembly language programs. Not content with that, he has now released a third more complex version that can actually generate machine code for a real CPU called the ARM.

All three versions run in a web browser, when your teacher asks you to, open up one of the links below in a new browser window:

LMC instructions

In the simple version of the LMC there is one usable register called the accumulator and a limited number of instructions detailed below

Instruction	Mnemonic	Description
INPUT	INP	Get data from the keyboard and store into the accumulator
OUTPUT	OUT	Display the contents of the accumulator in the output area of the simulator
HALT	HLT	Stop the program
LOAD	LDA	Load the value stored in the given address into the accumulator
STORE	STA	Store the value currently in the accumulator into the given address
ADD	ADD	Add the value stored at the given address to what is already in the accumulator
SUBTRACT	SUB	Subtract the value stored at the given address from what is already in the accumulator
DATA	DAT	Reserve one memory location to store some data (can be followed by a value to set the initial value of the memory location)
BRANCH ALWAYS	BRA	Change the program counter value to the specified address. Results in the “jumping” of execution to that address always.
BRANCH IF ZERO	BRZ	If the accumulator contains zero, change the program counter value to the specified address. Results in the “jumping” of execution to that address when accumulator is zero, or continuing as normal otherwise.
BRANCH IF POSITIVE	BRP	If the accumulator contains a positive number OR zero, change the program counter value to the specified address. Results in the “jumping” of execution to that address when accumulator is positive or zero, or continuing as normal otherwise.

Instruction set for simple LMC

Activity 1

Use the simple LMC to enter (left hand panel) and assemble (submit button) the following program:

        INP
        STA 10
        INP
        STA 11
        HLT

Before running the program, predict what it will do, and how you think the RAM will be altered when it is run. Now execute it (run button) to see what will happen.

Extend your program so that it adds together the contents of the two memory locations and stores the result into a third memory location.

Make a note of the working program in your class book when complete.

Labels

A useful feature of an assembler is called a label. It is used to give a friendly name to an address. In the example above, we used locations 10 and 11 to store input values. It would have made a more readable program if we had used labels instead of the number 10 and 11.

        INP
        STA NUMBER1
        INP
        STA NUMBER2
        HLT
NUMBER1 DAT
NUMBER2 DAT

A second advantage of using labels is we don’t have to keep track of how long our program is, as there is no danger of the program overwriting the memory locations used to store data. In the original code if we added extra instructions before HLT we eventually would have overwritten memory locations 10 and 11 with the program instructions! Using labels means this will never occur because the assembler works out what address NUMBER1 and NUMBER2 will be, so if extra instructions are inserted the memory addresses used for NUMBER1 and NUMBER2 will automatically be changed by the assembler.

Note DAT is not an instruction, merely a directive for the assembler itself, which is the programmer’s way of telling the assembler “reserve a memory location here”. If we put DAT on its own it will simply reserve a memory location. If we follow DAT with a value, it will store that initial value at the address so it is available when the program first runs.

NUMBER1 DAT  - stores no initial value at location NUMBER1
NUMBER2 DAT 65 - stores the initial value 65 at location NUMBER2

The value can be overwritten if needed.

By using labels in this way we are in effect creating the equivalent of a variable in a high level language.

Activity 2

Write a program that gets the user to input three numbers and stores them.
It should then subtract the second and third numbers from the first and display the output.
Test the program with the values 9, 3 and 2, to check it produces an output of 4.
Record the completed program in your book.

Labels for branching

Another use for labels is when using branch instructions. Try the following code in LMC:

        INP
        STA VAR1
        BRA SKIP
        OUT
SKIP    HLT
VAR1    DAT

Why does this program not reach the OUT instruction?

BRA is an instruction that changes the usual flow of a program, which normally would proceed from one instruction to the next in a sequential manner. In the example, rather than proceeding from the 3rd instruction to the 4th, the program will jump over the OUT instruction and move straight to the HLT instruction at the label SKIP.

BRZ will also cause a branch to occur like BRA, but only if the accumulator contains zero. We can make use of this to produce the equivalent of the high level language if…then construct.

Activity 3

Produce a program that gets the user to input two numbers. If the numbers are equal, it should output 1, otherwise it should output 0.
Use the INP, SUB, BRA, BRZ, HLT and OUT instructions to complete this task.

Knowledge check

What is machine code?
What is the relationship between machine code and assembly language?
What program translates assembly language into machine code?
What is an opcode and operand?
How do you identify addresses in assembly language programs?
What is the difference between BRA and BRZ?

Next lesson