Chapter 1: A Tour of Computer Systems

1.1 Information is Bits + Context

A source program is a sequence of bits (with a value of 0 or 1) organised in 8-bit chunks called bytes. Each byte represents some text character in the program. Most modern systems represent text characters using the ASCII standard that represents each character with a unique byte-sized integer value.

Files such as hello.c that consist exclusively of ASCII characters are known as text files. All other files are known as binary files.

All information in a system (disk files, programs stored in memory, user data stored in memory, data transferred across a network) is represented as a bunch of bits. The only thing that distinguishes different data objects is the context in which we view them. In different contexts, the same sequence of bytes might represent an integer, floating-point number, character string, or machine instruction.

1.2 Programs Are Translated By Other Programs into Different Forms

In order to run hello.c on the system, the individual C statements must be translated by other programs into a sequence of low-level machine-language instructions. These instructions are then packaged in a form called an executable object program and stored as a binary disk file. Object programs are also referred to as executable object files.

The programs that perform the four phases (preprocessor, compiler, assembler, and linker) are known collectively as the compilation system.

Use gcc -o hello hello.c to compile a C program with the GNU Compiler Collection.

Preprocessing phase

The preprocessor (cpp) modifies the original C program according to directives that begin with the # character. The result is another C program, typically with the .i suffix.

Compilation phase

The compiler (cc1) translates the text file hello.i into the text file hello.s, which contains an assembly-language program. Each statement in an assembly-language program exactly describes one low-level machine-language instruction in a standard text form. Assembly language is useful because it provides a common output language for different compilers for different high-level languages.

Assembly phase

Next, the assembler (as) translates hello.s into machine-language instructions, packages them in a form known as a relocatable object program, and stores the result in the object file hello.o.

Linking phase

The linker (ld) handles merging printf function which resides in a separate precompiled object file called printf.o.

1.3 It Pays to Understand How Compilation Works

1.4 Processors Read and Interpret Instructions Stored in Memory

To run the executable file on a Unix system, we type its name to an application program known as a shell.  The shell is a command-line interpreter that prints a prompt, waits for you to type a command line, and then performs the command.

1.4.1 Hardware Organisation of a System

Buses carry bytes of information back and forth between the components. Buses are typically designed to transfer fixed-sized chunks of bytes known as a word. The number of bytes in a word (the word size) is a fundamental system parameter that varies across systems.

Most machines today have word sizes of either 4 bytes (32 bits) or 8 bytes (64 bits).

I/O Devices
Input/output (I/O) devices are the system’s connection to the external world. Initially, the executable hello program resides on the disk. Each I/O device is connected to the I/O bus by either a controller or an adapter.

Main Memory
The main memory is a temporary storage device that holds both a program and the data it manipulates while the processor is executing the program.

Physically, main memory consists of a collection of dynamic random access memory (DRAM) chips.

Logically, memory is organized as a linear array of bytes, each with its own unique address (array index) starting at zero.

The central processing unit (CPU) or processor is the engine that interprets (or executes) instructions stored in main memory. At its core is a word-sized storage device (or register) called the program counter (PC). At any point in time, the PC points at (contains the address of) some machine-language instruction in main memory.

The register file is a small storage device that consists of a collection of word-sized registers, each with its own unique name. The arithmetic/logic unit (ALU) computes new data and address values.

1.4.2 Running the hello program

1.5 Caches Matter

A system spends a lot of time moving information from one place to another.  Thus, a major goal for system designers is to make these copy operations run as fast as possible.

The disk drive on a typical system might be 1000 times larger than the main memory, but it might take the processor 10,000,000 times longer to read a word from disk than from memory. Similarly, a typical register file stores only a few hundred bytes of information, as opposed to billions of bytes in the main memory. However, the processor can read data from the register file almost 100 times faster than from memory. As semiconductor technology progresses over the years, this processor-memory gap continues to increase.

To deal with the processor-memory gap, system designers include smaller faster storage devices called cache memories (or simply caches) that serve as temporary staging areas for information that the processor is likely to need in the near future.

An L1 cache on the processor chip holds tens of thousands of bytes and can be accessed nearly as fast as the register file. A larger L2 cache with hundreds of thousands to millions of bytes is connected to the processor by a special bus. It might take 5 times longer for the process to access the L2 cache than the L1 cache, but this is still 5 to 10 times faster than accessing the main memory.

The L1 and L2 caches are implemented with a hardware technology known as static random access memory (SRAM).

1.6 Storage Devices Form a Hierarchy

The storage devices in every computer system are organized as a memory hierarchy. As we move from the top of the hierarchy to the bottom, the devices become slower, larger, and less costly per byte.

1.7 The Operating System Manages the Hardware

When the shell loaded and ran the hello program, and when the hello program printed its message, neither program accessed the keyboard, display, disk, or main memory directly. Rather, they relied on the services provided by the operating system. We can think of the operating system as a layer of software interposed between the application program and the hardware. All attempts by an application program to manipulate the hardware must go through the operating system.

The operating system has two primary purposes:
(1) to protect the hardware from misuse by runaway applications, and
(2) to provide applications with simple and uniform mechanisms for manipulating complicated and often wildly different low-level hardware devices.

The operating system achieves both goals via the fundamental abstractions: processes, virtual memory, and files.

Files are abstractions for I/O devices, virtual memory is an abstraction for both the main memory and disk I/O devices, and processes are abstractions for the processor, main memory, and I/O devices.

1.7.1 Processes

A process is the operating system’s abstraction for a running program.  Multiple processes can run concurrently on the same system. The operating system performs this interleaving with a mechanism known as context switching.  The operating system keeps track of all the state information that the process needs in order to run. This state, which is known as the context, includes information such as the current values of the PC, the register file, and the contents of main memory.  When the operating system decides to transfer control from the current process to some new process, it performs a context switch by saving the context of the current process, restoring the context of the new process, and then passing control to the new process. The new process picks up exactly where it left off.

1.7.2 Threads

A process can actually consist of multiple execution units, called threads, each running in the context of the process and sharing the same code and global data.

Threads are an increasingly important programming model because of the requirement for concurrency in network servers, because it is easier to share data between multiple threads than between multiple processes, and because threads are typically more efficient than processes.

1.7.3 Virtual Memory

Virtual memory is an abstraction that provides each process with the illusion that it has exclusive use of the main memory. Each process has the same uniform view of memory, which is known as its virtual address space.

1.7.4 Files

A file is simply a sequence of bytes. Every I/O device, including disks, keyboards, displays, and even networks, is modeled as a file. All input and output in the system is performed by reading and writing files, using a small set of system calls known as Unix I/O.

1.8 Systems communicate with other systems using networks

From the point of view of an individual system, the network can be viewed as just another I/O device.  When the system copies a sequence of bytes from main memory to the network adapter, the data flows across the network to another machine, instead of, say, to a local disk drive. Similarly, the system can read data sent from other machines and copy this data to its main memory.

1.9 Important Themes

1.9.1 Concurrency and Parallelism

Two demands on computers have been constant forces driving improvements: we want them to do more, and we want them to run faster.  Both of these factors improve when the processor does more things at once.

We use the term concurrency to refer to the general concept of a system with multiple, simultaneous activities, and the term parallelism to refer to the use of concurrency to make a system run faster.

Thread level concurrency
Building on the process abstraction, we are able to devise systems where multiple programs execute at the same time, leading to concurrency. With threads, we can even have multiple control flows executing within a single process.

Until recently, most actual computing was done by a single processor, even if that processor had to switch among multiple tasks. This configuration is known as a uniprocessor system.

When we construct a system consisting of multiple processors all under the control of a single operating system kernel, we have a multiprocessor system.

The use of multiprocessing can improve system performance by:

  • reducing the need to simulate concurrency when performing multiple tasks
  • running a single application program faster, but only if that program is expressed in terms of multiple threads that can effectively execute in parallel.

Instruction-Level Parallelism
At a much lower level of abstraction, modern processors can execute multiple instructions at one time, a property known as instruction-level parallelism.

For example, early microprocessors, such as the 1978-vintage Intel 8086 required multiple (typically, 3–10) clock cycles to execute a single instruction. More recent processors can sustain execution rates of 2–4 instructions per clock cycle. Any given instruction requires much longer from start to finish, perhaps 20 cycles or more, but the processor uses a number of clever tricks to process as many as 100 instructions at a time.

Single-Instruction, Multiple-Data (SIMD) Parallelism
At the lowest level, many modern processors have special hardware that allows a single instruction to cause multiple operations to be performed in parallel, a mode known as single-instruction, multiple-data, or SIMD parallelism. For example, recent generations of Intel and AMD processors have instructions that can add four pairs of single-precision floating-point numbers (C data type float) in parallel.

1.9.2 The Importance of Abstractions in Computer Science

The use of abstractions is one of the most important concepts in computer science.

On the processor side, the instruction set architecture provides an abstraction of the actual processor hardware. With this abstraction, a machine-code program behaves as if it were executed on a processor that performs just one instruction at a time. The underlying hardware is far more elaborate, executing multiple instructions in parallel, but always in a way that is consistent with the simple, sequential model. By keeping the same execution model, different processor implementations can execute the same machine code, while offering a range of cost and performance.

On the operating system side, we have introduced three abstractions: files as an abstraction of I/O, virtual memory as an abstraction of program memory, and processes as an abstraction of a running program. To these abstractions we add a new one: the virtual machine, providing an abstraction of the entire computer, including the operating system, the processor, and the programs.

Part 1 Program Strucure and Execution

Chapter 2: Representing and Manipulating Information

Modern computers store and process information represented as 2-valued signals, or binary digits known as bits.

The electronic circuitry for storing and performing computations on 2-valued signals is very simple and reliable, enabling manufacturers to integrate millions, or even billions, of such circuits on a single silicon chip.

When we group bits together and apply some interpretation that gives meaning to the different possible bit patterns, however, we can represent the elements of any finite set.

Unsigned encodings represent numbers greater than or equal to 0 and are based on traditional binary notation.

Signed integers are numbers that may be either positive or negative and are most commonly represented by two's complement encodings.

Floating-point encodings are a base-two version of scientific notation for representing real numbers.

Computers implement arithmetic operations, such as addition and multiplication, with these different representations, similar to the corresponding operations on integers and real numbers.

Computer representations use a limited number of bits to encode a number, and hence some operations can overflow when the results are too large to be represented.  For example, on most of today’s computers (those using a 32-bit representation of data type int), computing the expression 200 * 300 * 400 * 500 yields −884,901,888.

Floating-point arithmetic has different mathematical properties. The product of a set of positive numbers will always be positive, although overflow will yield the special value +∞.  Floating-point arithmetic is not associative, due to the finite precision of the representation.

The different mathematical properties of integer vs. floating-point arithmetic stem from the difference in how they handle the finiteness of their representations — integer representations can encode a comparatively small range of values, but do so precisely, while floating-point representations can encode a wide range of values, but only approximately.

2.1 Information Storage

Rather than accessing individual bits in memory, most computers use blocks of eight bits, or bytes, as the smallest addressable unit of memory. A machine-level program views memory as a very large array of bytes, referred to as virtual memory. Every byte of memory is identified by a unique number, known as its address, and the set of all possible addresses is known as the virtual address space.

Pointers are a central feature of C. They provide the mechanism for referencing elements of data structures, including arrays. Just like a variable, a pointer has two aspects: its value and its type. The value indicates the location of some object, while its type indicates what kind of object (e.g., integer or floating-point number) is stored at that location.

2.1.1 Hexadecimal Notation

A single byte consists of 8 bits. In binary notation, its value ranges from 00000000^2 to 11111111^2. When viewed as a decimal integer, its value ranges from 0^10 to 255^10.  Neither notation is very convenient for describing bit patterns. Binary notation is too verbose, while with decimal notation, it is tedious to convert to and from bit patterns. Instead, we write bit patterns as base-16, or hexadecimal numbers. Hexadecimal (or simply “hex”) uses digits ‘0’ through ‘9’ along with characters ‘A’ through ‘F’ to represent 16 possible values. Written in hexadecimal, the value of a single byte can range from 00^16 to FF^16.

In C, numeric constants starting with 0x or 0X are interpreted as being in hexadecimal. The characters ‘A’ through ‘F’ may be written in either upper or lower case. For example, we could write the number FA1D37B16 as 0xFA1D37B, as 0xfa1d37b, or even mixing upper and lower case, e.g., 0xFa1D37b.

Conversely, given a binary number 1111001010110110110011, you convert it to hexadecimal by first splitting it into groups of 4 bits each. Note, however, that if the total number of bits is not a multiple of 4, you should make the leftmost group be the one with fewer than 4 bits, effectively padding the number with leading zeros.

2.1.2 Words

Every computer has a word size, indicating the nominal size of integer and pointer data. Since a virtual address is encoded by such a word, the most important system parameter determined by the word size is the maximum size of the virtual address space.

That is, for a machine with a w-bit word size, the virtual addresses can range from 0 to 2^w − 1, giving the program access to at most 2^w bytes.

Most personal computers today have a 32-bit word size. This limits the virtual address space to 4 gigabytes (written 4 GB), that is, just over 4 × 10^9 bytes.

2.1.3 Data Sizes

Computers and compilers support multiple data formats using different ways to encode data, such as integers and floating point, as well as different lengths. For example, many machines have instructions for manipulating single bytes, as well as integers represented as 2-, 4-, and 8-byte quantities. They also support floating-point numbers represented as 4- and 8-byte quantities.

2.1.4 Addressing and Byte Ordering

For program objects that span multiple bytes, we must establish two conventions:

  • what the address of the object will be, and
  • how we will order the bytes in memory.

In virtually all machines, a multi-byte object is stored as a contiguous sequence of bytes, with the address of the object given by the smallest address of the bytes used. For example, suppose a variable x of type int has address 0x100, that is, the value of the address expression &x is 0x100. Then the 4 bytes of x would be stored in memory locations 0x100, 0x101, 0x102, and 0x103.

For ordering the bytes representing an object, there are two common conventions. Consider a w-bit integer having a bit representation [xw−1, xw−2, . . . , x1, x0], where xw−1 is the most significant bit and x0 is the least. Assuming w is a multiple of 8, these bits can be grouped as bytes, with the most significant byte having bits [xw−1, xw−2, . . . , xw−8], the least significant byte having bits [x7, x6, . . . , x0], and the other bytes having bits from the middle. Some machines choose to store the object in memory ordered from least significant byte to most, while other machines store them from most to least. The former convention — where the least significant byte comes first — is referred to as little endian. This convention is followed by most Intel-compatible machines. The latter convention—where the most significant byte comes first—is referred to as big endian. This convention is followed by most machines from IBM and Sun Microsystems.

Generally, byte ordering is invisible to the programmer.  It is an issue when binary data is communicated over the network between different machines.  It is also an issue when looking at the byte sequences representing integer data, as occurs when inspecting machine level programs.  A third case where byte ordering becomes visible is when programs are written that circumvent the normal type system. In the C language, this can be done using a cast to allow an object to be referenced according to a different data type from which it was created. Such coding tricks are strongly discouraged for most application programming, but they can be quite useful and even necessary for system-level programming.

2.1.5 Representing Strings

A string in C is encoded by an array of characters terminated by the null (having value 0) character. Each character is represented by some standard encoding, with the most common being the ASCII character code.

Due to ASCII character encoding, text data is more platform independent than binary data.

2.1.6 Representing Code

Instruction codings per machine differ. Different machine types use different and incompatible instructions and encodings. Binary code is seldom portable across different combinations of machine and operating system.

From machine perspective a program is simply a sequence of bytes.

2.1.7 Introduction to Boolean Algebra

Binary values 1 and 0 encode TRUE and FALSE.

Operations ~ & | and ^ encode logical operations NOT (¬), AND (∧), OR (∨) and EXCLUSIVE-OR (⊕)

The four Boolean operators can also operate on bit vectors, strings of zeros and ones of some fixed length w.

2.1.8 Bit-Level Operations in C

&, |, ~ and ^ or used in C as bitwise operators. These can be applied to any "integral" data type, that is char or int whether with or without short, long, long long or unsigned qualifers.

The best way to determine the effect of a bit-level expression is to expand the hexadecimal arguments to their binary representations, perform the operations in binary, and then convert back to hexadecimal.

One common use of bit-level operations is to implement masking operations,
where a mask is a bit pattern that indicates a selected set of bits within a word.

2.1.9 Logical Operations in C

|| && and ! are logical operators OR, AND and NOT.

They are quite different to bit-level operations. The logical operations treat any nonzero argument as representing True and argument 0 as representing False. They return either 1 or 0, indicating a result of either True or False, respectively.

2.1.10 Shift Operations in C

Shift operations in C shift bit patterns to the left and right.

x << k shifts x k bits to the left, dropping the k most significant bits and filling the right end with zeros.

Right shift may be logical or arithmetic shift: a logical right shift fills the left end with k zeros, a arithmetic shift right fills the left with with k representations of the most significant bit.

The C standards do not precisely define which type of right shift should be used. For unsigned data (i.e., integral objects declared with the qualifier unsigned), right shifts must be logical. For signed data (the default), either arithmetic or logical shifts may be used. In practice, however, almost all compiler/machine combinations use arithmetic right shifts for signed data.

2.2 Integer Representations