CDA 3101: Introduction to Computer Hardware and Organization ...

157 downloads 114 Views 1MB Size Report
CDA 3101: Introduction to Computer Hardware and. Organization. Supplementary Notes. Charles N. Winton. Department of Computer and Information Sciences.
CDA 3101: Introduction to Computer Hardware and Organization Supplementary Notes Charles N. Winton Department of Computer and Information Sciences University of North Florida Jacksonville, FL 32224-2645 Levels of organization of a computer system: a) Electronic circuit level b) Logic level - combinational logic*, sequential logic*, register-transfer logic* c) Programming level - microcode programming*, machine/assembly language programming, high-level language programming d) Computer systems level - systems hardware* (basic hardware architecture and organization - memory, CPU (ALU, control unit), I/0, bus structures), systems software, application systems * topics discussed in these notes Objectives: Understand computer organization and component logic • Boolean algebra and truth table logic • Integer arithmetic and implementation algorithms • IEEE floating point standard and floating point algorithms • Register contruction • Memory construction and organization • Register transfer logic • CPU organization • Machine language instruction implementation Develop a foundation for • Computer architecture • Microprocessor interfacing • System software Sections: • combinational logic • sequential logic • computer architecture

2005

Contents Section I - Logic Level: Combinational Logic.................................................... 1 Table of binary operations .................................................................. 3 Graphical symbols for logic gates ........................................................... 4 Representing data ........................................................................... 6 2’s complement representation .......................................................... 10 Gray code .............................................................................. 15 Boolean algebra ............................................................................ 16 Canonical forms ............................................................................ 22 Σ and Π notations .......................................................................... 23 NAND-NOR conversions ....................................................................... 23 Circuit analysis ........................................................................... 25 Circuit simplification: K-maps ............................................................. 25 Circuit design ............................................................................. 33 Gray to binary decoder ..................................................................... 35 BCD to 7-segment display decoder ........................................................... 36 Arithmetic circuits ........................................................................ 39 AOI gates .................................................................................. 42 Decoders/demultiplexers .................................................................... 43 Multiplexers ............................................................................... 44 Comparators ................................................................................ 46 Quine-McCluskey procedure .................................................................. 48 Section II - Logic Level: Sequential Logic..................................................... 50 Set-Reset (SR) latches ..................................................................... 51 Edge-triggered flip-flops .................................................................. 54 An aside about electricity ................................................................. 56 (Ohms’s Law, resistor values, batteries, AC) D-latches and D flip-flops ................................................................. 58 T flip-flops and JK flip-flops ............................................................. 60 Excitation controls ........................................................................ 61 Registers .................................................................................. 64 Counters ................................................................................... 65 Sequential circuit design – finite state automata .......................................... 66 Counter design ............................................................................. 70 Moore and Mealy circuits ................................................................... 72 Circuit analysis ........................................................................... 72 Additional counters ........................................................................ 74 Barrel shifter ............................................................................. 77 Glitches and hazards ....................................................................... 78 Constructing memory ........................................................................ 83 International Unit Prefixes (base 10) ...................................................... 88 Circuit implementation using ROMs .......................................................... 89 Hamming Code ............................................................................... 93 Section III – Computer Systems Level........................................................... 96 Representing numeric fractions ............................................................. 96 IEEE 754 Floating Point Standard ........................................................... 98 Register transfer logic ................................................................... 101 Register transfer language (RTL) .......................................................... 102 UNF RTL .................................................................................. 106 Signed multiply architecture and algorithm ................................................ 112 Booth’s method ............................................................................ 114 Restoring and non-restoring division ...................................................... 117 Implementing floating point using UNFRTL .................................................. 125 Computer organization ..................................................................... 128 Control unit .............................................................................. 129 Arithmetic and Logic unit ................................................................. 129 CPU registers ............................................................................. 130 Single bus CPU organization ............................................................... 131 Microcode signals ......................................................................... 132 Microprograms ............................................................................. 134 Branching ................................................................................. 136 Microcode programming ..................................................................... 137 Other machine language instructions ....................................................... 137 Index register ............................................................................ 140 Simplified Instructional Computer (SIC) ................................................... 143 Architectural enhancements ................................................................ 144 CPU-memory synchronization ................................................................ 146 Inverting microcode ....................................................................... 148 Vertical microcode ........................................................................ 149 Managing the CPU and peripheral devices ................................................... 149 The Z80 ................................................................................... 152

Page 1 Logic level: Combinational Logic Combinational logic is characterized by functional specifications using only binary valued inputs and binary valued outputs

r input variables

X

...

combinational logic

Z=f(X)  (Z is

s output variables ...

a function of

Z X) 

Remark: for given values of r and s, the number of possible functions is finite since both the domain and the range of functions are finite, of size 2r and 2s respectively (this is because the r input variables and the s output variables assume only the binary values 0 and 1). Although finite, it is worth noting that in practice the number of functions is usually quite large: For example, for r = 5 input variables and s = 1 output variable, the domain consists of the 25 = 32 possible input combinations of the two binary input values 0 and 1. To specify a function, each of these 32 possible input combinations must be assigned a value in the range, which consists of the two binary output values 0 and 1. This yields 232 = 4 billion such functions of 5 variables! In general, with r input variables and s output variables, the domain consists of the k = 2r combinations of the binary input values. The range consists of the j = 2s combinations of the binary output values. To specify a function, each of the j input combinations must be assigned to 1 of k possible values in the range. Since there are jk possible ways to do this, there are jk functions having r inputs and s outputs. Each such function corresponds to a logic circuit having r (binary-valued) inputs and s (binary-valued) outputs. When r = 2 input variables and s = 1 output variable, there are 24 = 16 possible functions (circuits), each having the basic appearance X Y

f

Z = f(X,Y)

Recall that functions of 2 variables are called binary operations. For the usual algebra of numbers these include the familiar operations of addition, subtraction, multiplication, and division and as many more as we might care to define.

Page 2 For circuit logic, the input variables are restricted to the values 0 and 1, so there are only 4 possible input combinations of X and Y, yielding exactly 16 possible binary operations. The corresponding logic circuits provide fundamental building blocks for more complex logic circuits. Such fundamental circuits are termed logic gates. Since there are only 16 of them, they can be listed out - see overleaf. They are named for ease of reference and to reflect common terminology. It should be noted that some of the binary operation are "degenerate." In particular, Zero(X,Y) and One(X,Y) depend on neither X nor Y to determine their output; X(X,Y) and NOT X(X,Y) have output determined strictly by X; Y(X,Y) and NOT Y(X,Y) have output determined strictly by Y. X and NOT X operations (or Y and NOT Y, for that matter) are usually thought of as unary operations (functions of 1 variable) rather than degenerate binary operations. As unary operations they are respectively termed the "identity" and the "complement".

Page 3

TABLE OF BINARY OPERATIONS Inhibit X on Y=1 X

Inhibit Y on X=1

Y

XOR

OR

NOR

0

0

0

0

0

1

1

1

1

1

1

1

1

0

0

1

1

1

1

0

0

0

0

1

1

1

1

0

1

1

0

0

1

1

0

0

1

1

0

0

1

1

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

X

Y

Zero

AND

0

0

0

0

0

0

1

0

0

1

0

0

1

1

0

COINC

NOT Y

Y ← X

NOT X X → Y

NAND

One

Page 4 __ The complement (or NOT) is designated by an overbar; e.g., X is the complement of X. The other most commonly employed binary operations for combinational logic also have notational designations; e.g., AND OR NAND NOR XOR COINCIDENCE

is is is is is is

designated designated designated designated designated designated

by by by by by by

•, e.g., X • Y +, e.g., X + Y ↑, e.g., X ↑ Y ↓, e.g., X ↓ Y ⊕, e.g., X ⊕ Y , e.g., X Y.

u

u

Note that if we form the simple composite function f (NOT f, or the complement of f), that __ _______ == f (X) = f ( X ) and f = f __________ Moreover, X ↑ Y = X • Y = X ↑ Y(NAND ≡ NOT AND) - Sheffer stroke __________ X ↓ Y = X + Y (NOR = NOT OR) - Pierce arrow _____ _____ X Y = X ⊕ Y (COINC = complement of XOR)

u

In particular, NAND and AND, OR and NOR, XOR and COINC are respectively complementary in the sense that each is respectively the complement of the other. Rather than use a general graphical "logic gate" designation X

Z = f(X,Y)

Y ANSI (American National Standards Institute) has standardized on the following graphical symbols for the most commonly used logic gates. AND (•)

NAND (↑)

XOR

OR

NOR

COINC ( )

NOT

(+)

(↓)

(⊕)

u

Page 5 Composite functions such as f(g(x)) can be easily represented using these symbols; e.g., consider the composite __ __ f(A,B,C,D) = ((A B )↑C) ((A⊕C)↓ D )

u

This is easily represented as a 3-level circuit diagrammed by: A

.

B

.

C

f(A,B,C,D)

D The level of a circuit is the maximal number of gates an input signal has to travel through to establish the circuit output. Normally, both an input signal and it's inverse are assumed to be available, so the NOT gate on B does not count as a 4th level for the circuit. Note that the behavior of the above circuit can be totally determined by evaluating its behavior for each possible input combination (we'll return to determining its values later): A

B

C

D

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

f(A,B,C,D) 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1

Note that this table provides an exhaustive specification of the logic circuit more compactly given by the above algebraic expression for f. Its form corresponds to the "truth" tables used in symbolic logic. For small circuits, the truth table form of specifying a logic function is often used. The inputs to a logic circuit typically represent data values encoded in a binary format as a sequence of 0's and 1's. The encoding scheme may be selected to facilitate manipulation of the data. For example, if the data is numeric, it is normally encoded to facilitate performing arithmetic operations. If the data is alphabetic

Page 6 characters, it may be encoded to facilitate operations such as sorting. There are also encoding schemes to specifically facilitate effective use of the underlying hardware. A single input line is normally used to provide a single data bit of information to a logic circuit, representing the binary values of 0 or 1. At the hardware level, 0 and 1 are typically represented by voltage levels; e.g., 0 by voltage L ("low") and 1 by voltage H ("high"). For the TTL (Transistor-Transistor Logic) technology, H = +5V and L = OV (H is also referenced as Vcc - "common cathode" and L as GND or "ground"). Representing Data There are three fundamental types of data that must be considered: • • •

logical data (the discrete truth values - True and False) numeric data (the integers and real numbers) character data (the members of a defined finite alphabet)

Logical data representation: There is no imposed standard for representing logical data in computer hardware and software systems, but a single data bit is normally used to represent a logical data item in the context, of logic circuits, with "True" represented by 1 and "False" by 0. This is the representation implicitly employed in the earlier discussion of combinational logic circuits, which are typically implementations of logic functions described via the mechanisms of symbolic logic. If the roles of 0 and 1 are reversed (0 representing True and 1 representing False), then the term negative logic is used to emphasize the change in representation for logical data. Numeric data: The two types of numeric data, • •

integers real numbers

are represented very differently. The representation in each case must deal with the fact that a computing environment is inherently finite. Integers: When integers are displayed for human consumption we use a "base representation”. This requires us to establish characters which represent the base digits. Since we have ten fingers, the natural human base is ten and the Arabic characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 are used to represent the base digits. Since logic circuits deal with binary inputs (0 or 1), the natural base in this context is two. Rather than invent new characters, the first two base ten characters (0 and 1)

Page 7 are used to represent the base two digits. Any integer can be represented in any base, so long as we have a clear understanding of which base is being used and know what characters represent its digits. For example, 1910 indicates a base ten representation of nineteen. In base two it is represented by 1 0 0 1 12. When dealing different bases, it is important to be able to convert from the representation in one base to that of the other. Note that it is easy to convert from base 2 to base 10, since each base 2 digit can be thought of as indicating the presence or absence of a power of 2. 1 0 0 1 12 = 1×24 + 0×23 + 0×22 + 1×21 + 1×20 = 16 + 0 + 0 + 2 + 1 = 1910 = 1×101 + 9×100 A conversion from base 10 to base 2 is more difficult but still straight forward. It can be handled "bottom-up" by repeated division by 2 until a quotient of 0 is reached, the remainders determining the powers of 2 that are present: 19/2 9/2 4/2 2/2 1/2

= = = = =

9 4 2 1 0

R R R R R

1 1 0 0 1

(20 (21 (22 (23 (24

is is is is is

present) present) not present) not present) present)

The conversion can also be handled "top-down" by iteratively subtracting out the highest power of 2 present until a difference of 0 is reached: 19 - 16 = no 8's no 4's 3 - 2 = 1 - 1 =

3 1 0

(1) (0) (0) (1) (1)

(16=24 ( 8=23 ( 4=22 ( 2=21 ( 1=20

is is is is is

present so remove 16) not present in what's left) not present) present so remove 2) present in what's left)

Bases which are powers of 2 are particularly useful for representing binary data since it is easy to convert to and from among them. The most commonly used are base 8 (octal) which uses as base digits 0, 1, 2, 3, 4, 5, 6, 7 and base 16 (hexadecimal) which uses as base digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F where A, B, C, D, E, F are the base digits for ten, eleven, twelve, thirteen, fourteen, and fifteen. An n-bit binary item can easily be viewed in the context of any of base 2, base 8, or base 16 simply by appropriately grouping the bits; for example, the 28 bit binary item

Page 8 1

4 |

6 |

5 |

3 |

4 |

0 |

0 |

2 |

4 |

1 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 | | | | | | | C D 5 C 0 1 4

| |

is easily seen to be 14653400248 = CD5C01416 when the bits are grouped as indicated (using a calculator that handles base conversions, you can determine that the base ten value is 21533493210; note that such calculators are typically limited to ten base 2 digits, but handle 8 hexadecimal digits, effectively extending the range of the calculator to 32 bits when the hexadecimal digits are viewed as 4-bit chunks). Since it is easier to read a string of hexadecimal (hex) digits than a string of 0's and 1's, and the conversion to and from base 16 is so straightforward, digital information of many bits is frequently displayed using hex digits (or sometimes octal, particularly for older equipment). Since digital circuits generally are viewed as processing binary data, a natural way to encode integers for the use by such circuits is to use fixed blocks of n bits each; in particular, 32-bit integers are commonly used (i.e., n = 32). In general, an n-bit quantity may be viewed as naturally representing one of the 2n integers in the range [0, 2n-1] in its base 2 form. For example, for n = 5, there are 25 = 32 such numbers. The 5-bit representations of these numbers in base 2 form are 0 0 0 0 02 = 0l0 0 0 0 0 12 = 110 . . . 1 1 1 1 12 = 3110 Note that as listed, the representation does not provide for negative numbers. One strategy to provide for negative numbers is to mimic the "sign-magnitude" approach normally used in everyday base 10 representation of integers. For example, -27310 explicitly exhibits as separate entries the sign and the magnitude of the number. A sign-magnitude representation strategy could use the first bit to represent the sign (0 for +, 1 for -). While perhaps satisfactory for everyday paper and pencil use, this strategy has awkward characteristics that weigh against it. First of all, the operation of subtraction is algorithmically vexing even for base 10 paper and pencil exercises. For example, the subtraction problem 2310 - 3410 is typically handled not by subtracting 3410 from 2310, but by first subtracting 2310 from 3410, exactly the opposite of what the problem is asking for! Even worse, 010 is represented twice (e.g., when n = 5, 010 is represented by both 0 0 0 0 0 and 1 0 0 0 0). Conceptually, the subtraction problem above can be viewed as the addition problem 2310 + (-3410). However, adding the corresponding sign-magnitude

Page 9 representations as base 2 quantities will yield an incorrect result in many cases. Since numeric data is typically manipulated computationally, the representation strategy should facilitate, rather than complicate, the circuitry designed to handle the data manipulation. For these reasons, when n bits are used, the resulting 2n binary combinations are viewed as representing the integers modulo 2n, which inherently provides for negative integers and well-defined arithmetic (modulo 2n). The last statement needs some explanation. considering the number line | . . . -231 . . .

| -2

| -1

| 0

First observe that in | 2 . . .

| 231-1 . . .

truncation of the binary representation for any non-negative integer i to n bits results in i mod 2n. Note that an infinite number of non-negative integers (precisely 2n apart from each other) truncate to a given particular value in the range [0, 2n-1]; i.e., there are 2n such groupings, corresponding to 0, 1, 2, . . . , 2n-1. Negative integers can be included in each grouping simply by taking integers 2n apart without regard to sign. These groupings are called the "residue classes” modulo 2n. Knowing any member of a residue class is equivalent to knowing all of them (just adjust up or down by multiples of 2n to find the others, or for non-negative integers truncate the base 2 representation at n bits to find the value in the range [0, 2n-1]). In other words, the 2n residue classes represented by 0, 1, 2, ..., 2n-1 provide a (finite) algebraic system that inherits its algebraic properties from the (infinite) integers, which justifies the viewpoint that this is a natural way to represent integer data in the context of a finite environment. Note that negative integers are implicitly provided for algebraically, since each algebraic entity (residue class) has an inverse under addition. For example, with n = 5, adding the mod 25 residue classes for 710 and 2510 yields [2510] + [710] = [3210] = [010] , so [2510] = [-710] Returning to the computing practice point of view of identifying the residue classes with the 5-bit representations of 0, 1, 2, ..., 25-1 in base 2 form, the calculation becomes 1 1 0 0 12 + 0 0 1 1 12 = 0 0 0 0 02 (truncated to 5 bits). The evident extension of this observation is that n-bit base 2 addition conforms exactly to addition modulo 2n, a fact that lends itself to circuit implementation. Again referring to the number line | . . . -16

. . .

| -2

| -1

| 0

| 1

| | 2 . . . 15

| 16

. . .

| 31

| 32

consider for n = 5 the following table exhibiting in base 10 the 32 residue classes modulo 25. Each residue class is matched to the 5 bit

Page 10 representation corresponding to its base value in the range 0, 1, 2, ..., 31: 5-bit residue class representation { . . . , -32, 0, 32, . . . } = [0] ≡ 0 0 0 0 0 { . . . , -31, 1, 33, . . . } = [1] ≡ 0 0 0 0 1 { . . . , -30, 2, 34, . . . } = [2] ≡ 0 0 0 1 0 . . . { . . . , -17, 15, 47, . . . } = [15] ≡ 0 1 1 1 1 { . . . , -16, 16, 48, . . . } = [16] = [-16] ≡ 1 0 0 0 0 . . . { . . . , -2, 30, 62, . . . } = [30] = [-2] ≡ 1 1 1 1 0 { . . . , -1, 31, 63, . . . } = [31] = [-1] ≡ 1 1 1 1 1 Evidently, the 5-bit representations with a leading 0 viewed as base 2 integers best represent the integers 0, 1, ..., 15. The 5-bit representations with a leading 1 best represent -16, -15, ..., -2, -1. This representation is called the 5-bit 2's complement representation. It provides for 0, 15 positive integers, and 16 negative integers. Since data normally originates in sign-magnitude form, an easy means is needed to convert to/from the sign-magnitude form. An examination of the table leads to the conclusion that finding the magnitude for a negative value in 5-bit 2's complement form can be accomplished by subtracting from 32 (1 0 0 0 0 0) and truncating the result. In general, this follows from the mod 25 residue class equivalences, -[i] = [-i] + [010] = [-i] + [3210] = [-i + 3210] = [3210-i] which demonstrates that subtracting from 32 and truncating the result will always result in the representation for -i. -i is called the 2's complement of i. One way to subtract from 32 is to subtract from 1 1 1 1 1 (which is 31) and then add 1 (all in base 2). This is equivalent to inverting each bit and then adding 1 (in base 2) to the overall result. There is nothing special in this discussion that requires 5 bits; i.e., the same rationale is equally applicable to an n-bit environment. Hence, in general, to find the 2's complement of an integer represented in n-bit 2's complement form, invert its bits and add 1 (in base 2). Example 1: Determine the 8-bit 2's complement representation of -3710. First, the magnitude of -3710 is given by 3710 = 1 0 0 1 0 12 which is 0 0 1 0 0 1 0 1 in 8-bit 2's complement form. The representation for -3710 is then given by the 2's complement of 3710, obtained by inverting the bits of the 8-bit representation of the magnitude and adding 1; i.e.,

Page 11

-3710

1 1 0 1 1 0 1 0 +0 0 0 0 0 0 0 1 = 1 1 0 1 1 0 1 1 in 8-bit 2's complement form.

Example 2: Determine the (base 10) value of the 9 bit 2's complement integers i = 0 0 0 0 1 1 0 1 1 j = 1 1 1 0 1 1 0 1 0 s = i + j For i, since the lead bit is 0, the sign is + and the magnitude of the number is directly given by its representation as a base 2 integer; i.e., i = 2710. For j, since the lead bit is 1, the number is negative, so its magnitude is given by -j. Inverting j's bits and adding 1 gives 0 0 0 1 0 0 1 0 1 +0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 = 3810 = -j (j's magnitude); i.e., j = -3810. i+j (which we now know is -1110) can be computed directly using ordinary base 2 addition modulo 29; i.e., i: 0 0 0 0 1 1 0 1 1 j: + 1 1 1 0 1 1 0 1 0 i+j: 1 1 1 1 1 0 1 0 1

= 2710 = -3810 = -1110

Example 2 illustrates that only circuitry for base 2 addition needs to be developed to perform addition and subtraction on integers represented in n-bit 2's complement form. Historically, a variation closed related to n-bit 2's complement, namely, n-bit 1's complement has also been used for integer representation in computing devices. The 1's complement of an n-bit block of 0's and 1's is obtained by inverting each bit. For this representation, arithmetic still requires only addition, but whenever there is a carry out of the sign position (and no overflow has occurred), 1 must be added to the result (a so-called "end-around carry", something easily achieved at the hardware level). For example, in 8-bit 1's complement 3810 = 0 0 1 0 0 1 1 0 -2710 = 1 1 1 0 0 1 0 0 1110 = 0 0 0 0 1 0 1 0 1 (end-around carry of carry-out) 0 0 0 0 1 0 1 1 Note that the end-around carry is only used when working in 1's complement. Integers do not have to be represented in n-bit blocks. Another representation format is Binary Coded Decimal (BCD), where each

Page 12 decimal digit of the base 10 representation of the number is separately represented using its 4-bit binary (base 2) form. The 4-bit forms are 0 = 0 0 1 = 0 0 2 = 0 0 . . 9 = 1 0

0 0 1 . 0

0 1 0 1

so in BCD, 27 is represented in 8 bits by

0 0 1 0 0 1 1 1 | | | 2 7

183 is represented in 12 bits by

0 0 0 1 1 0 0 0 0 0 1 1 | | | | 1 8 3

BCD is obviously a base 10 representation strategy. It has the advantage of being close to a character representation form (discussed below). When used in actual implementation, it is employed in sign-magnitude form (the best known of which is IBM's packed decimal form, which maintains the sign in conjunction with the last digit to accommodate the fact that the number of bits varies from number to number). Since there is no clear choice as to how to represent the sign, we will not address the sign-magnitude form further in the context of discussing BCD. It is possible to build BCD arithmetic circuitry, but it is more complex than that used for 2's complement. The arithmetic difficulties associated with BCD can easily be seen by considering what happens when two decimal digits are added whose sum exceeds 9. For example, adding 9 and 4 using ordinary base 2 yields 1 0 0 1 = 9 0 1 0 0 = 4 1 1 0 1 = 13 which differs from 0 0 0 1 0 0 1 1 , which is 13 in BCD. | | | 1 3 Achieving the correct BCD result from the base 2 result requires adding a correction (+610 = 0 1 1 02); e.g., 1 1 0 1 + 0 1 1 0 0 0 0 1 0 0 1 1 = 13 in BCD. | | | In general, a correction of 6 is required whenever the sum of the two digits exceeds 9. Hence, the circuitry has to allow for the fact that

Page 13 sometimes a correction factor is required and sometimes not. BCD representation is normally handled using sign-magnitude, subtraction is an added problem to cope with.

Since a

Real numbers: Real numbers are normally represented in a format deriving from the idea of the decimal expansion, which is used in paper and pencil calculations to provide rational approximations to real numbers (this is termed a "floating point representation”, since the base point separating the integer part from the fractional part may shift as operations are performed on the number). There is a defined standard for representing real numbers, the IEEE 754 Floating Point Standard, whose discussion will be deferred until later due to its complexity. An alternate representation for real numbers is to fix the number of allowed places after the base point (a so-called "fixed point representation”) and use integer arithmetic. Since the number of places is fixed, the base point does not need to be explicitly represented (i.e., it is an "implied base point"). The result of applying arithmetic operations such as multiplication and division typically requires the use of additional (hidden) positions after the base point to accurately represent the result since a fixed point format truncates any additional positions resulting from multiplication or division. For this reason precision is quickly lost, further limiting the practicality of using this format. Character representation: Character data is defined by a finite set, its alphabet, which provides the character domain. The characters of the alphabet are represented as binary combinations of 0's and 1's. If 7 (ordered) bits are used, then the 7 bits provide 128 different combinations of 0's and 1's. Thus 7 bits provide encodings for an alphabet of up to 128 characters. If 8 bits are employed, then the alphabet may have as many as 256 characters. There are two defined standards in use in this country for representing character data: ASCII (American Standard Code for Information Interchange) EBCDIC (Extended Binary Coded Decimal Interchange Code). ASCII has a 7-bit base definition, and an 8-bit extended version providing additional graphics characters. (table page 21) In each case the standard prescribes an alphabet and its representation. Both standards have representation formats that make conversion from character form to BCD easy (for each character representing a decimal digit, the last 4 bits are its BCD representation). The representation is chosen so that when viewed in numeric ascending order, the corresponding characters follow the desired ordering for the defining alphabet, which means a numeric sort procedure can also be used for character sorting needs. Since character strings typically encompass many bits, character data is usually represented using hex digits rather than binary.

Page 14 For example, the text string "CDA 3101" is represented by C 3 C 4 C 1 4 0 F 3 F 1 F 0 F 1 in EBCDIC | | | | | | | | C D A spc 3 1 0 1 | | | | | | | | | 4 3 4 4 4 1 2 0 3 3 3 1 3 0 3 1 in ASCII (or ASCII-8). |

and

Since characters are the most easily understood measure for data capacity, an 8-bit quantity is termed a byte of storage and data storage capacities are given in bytes rather than bits or some other measure. 210 = 1024 bytes is called a K-byte, 220 = 1,048,576 bytes is called a megabyte, 230 bytes is called a gigabyte, 220 bytes is called a terabyte, and so forth. Other representation schemes: BCD is an example of a weighted representation scheme that utilizes the natural weighting of the binary representation of a number; i.e., w3 × d3 + w 2 × d2 + w 1 × d1 + w 0 × d0 where the digits di are just 0 or 1 and the weights are w3=8, w2=4, w1=2, w0=1. Since only 10 of the possible 16 combinations are used, w3 is 0 for all but 2 cases (8 and 9). A variation uses w3=2 to form what is known as "2421 BCD". w3=0 for 0,1,2,3,4 and w3=1 for 5,6,7,8,9. A major advantage over regular BCD is that the code is "selfcomplementing" in the sense that flipping the bits produced the 9's complement. Example: subtraction by using addition a subtraction such as 654 - 470 is awkward because of the need to borrow. The computation can be done by using addition if you think in terms of 654+(999-470)-999 = 654+529-999 = 1183-999 = 183+1 = 184. 999-470 = 529 is called the "9's complement" of 470, so the algorithm to do a subtraction A-B is 1. form the 9's complement (529) of the subtrahend B (470) 2. add it to the minuend A (654) 3. discard the carry and add 1 (corresponding to the end-around carry of 1's complement) Note that no subtraction circuitry is needed, but the technique does need an easy way to get the 9's complement. With 2421 BCD, 470 = 0100 1101 0000 and the 9's complement of 470 is 529 = 1011 0010 1111 Addition is still complicated as can be seen by adding 6+5 which is 1100 + 1011 = 0001 carry 1 (i.e., ordinary binary addition fails). A final BCD code, "excess-3 BCD", is also self-complementing. It is simply ordinary BCD + 3, so for the above example, with excess-3, 470 = 0111 1010 0011 and the 9's complement of 470 is 529 = 1000 0101 1100.

Page 15 The lesson to learn is that codes must be formulated to represent data in a computer, and different representations are employed for different purposes; e.g., • 2's complement is a number representation that facilitates arithmetic in base 2 • BCD is another number representation that facilitates translation of numbers to decimal character form but complicates arithmetic • ASCII represents characters in a manner that facilitates uppercase/lower-case adjustment and ease of conversion of decimal characters • Other schemes such as "2421 BCD" and "excess-3 BCD" seek to improve decimal arithmetic by facilitating use of 9's complement to avoid subtraction Sometimes representation schemes are designed to facilitate other tasks, such as representing graphical data elements or for tracking. For example, Gray Code is commonly used for identifying sectors on a rotating disk. Gray code is defined recursively by using the rule: to form the n+1 bit representation from the n-bit representation • preface the n-bit representation by 0 • append to this the n-bit representation in reverse order prefaced by 1 Hence, the 1, 2, and 3-bit representations are 0 1

00 01 11 10

000 001 011 010 110 111 101 100

Consider three concentric disks shaded as follows:

0

0 1

1 1

1

1

1

0 1

1

0

1

0

0

0 1

1 0 0

0

1 0 0

Page 16 The shading provides a gray code identification for 8 distinct wedgeshaped sections on the disk. As the disk rotates from one section to the next, no more than one digit position (represented by shaded and unshaded segments) changes, simplifying the task of determining the id of the next section when going from one section to the next. Note that this is a characteristic of the gray code. In contrast, note that in regular binary for the transition from 3 to 4, 011 to 100, all 3 digits change, which means hardware tracking the change if this representation was used would potentially face arbitrary intermediate patterns in the transition from section 3 to section 4, complicating the process of to determining that 4 is the id of the next section (e.g., something such as a delay would have to be added to the control circuitry to allow the transition to stabilize). For a disk such as above, a row of 3 reflectance sensors, one for each concentric band, can be used to track the transitions. Boolean algebra: Boolean algebra is the algebra of circuits, the algebra of sets, and the algebra of truth table logic. A Boolean algebra has two fundamental elements, a "zero" and a "one," whose properties are described below. For circuits "zero" is designated by 0 or L (for low voltage) and "one" by 1 or H (for high voltage). For sets, "zero" is the empty set and "one" is the set universe. For truth table logic, "zero" is designated by F (for false) and "one" by T (for true). Just as the algebraic properties of numbers are described in terms of fundamental operations (addition and multiplication), the algebraic properties of a Boolean algebra are described in terms of basic Boolean operations. For circuits, the basic Boolean operations are ones we’ve already discussed __ AND (•), OR (+), and complement ( ) For sets the corresponding operations are intersection (∩), union (∪), and set complement. For truth table logic they are AND (∧), OR (∨), and NOT (~). Recall that AND and OR are binary operations (an operation requiring two arguments), while complement is a unary operation (an operation requiring one argument).

Page 17 For circuits, also recall that – the multiplication symbol • is used for AND – the addition symbol + is use for OR __ – the symbol for complement is an overbar; i.e., X designates the complement of X. The utilization of • for AND and + for OR is due to the fact that these Boolean operations have algebraic properties similar to (but definitely not the same as) those of multiplication and addition for ordinary numbers. Basic properties for Boolean algebras (using the circuit operation symbols, rather than those for sets or for symbolic logic) are as follows: 1.

+ and • are commutative operations; e.g.,

Commutative property: X + Y = Y + X

and

X • Y = Y • X

In contrast to operations such as subtraction and division, a commutative operation has a left-right symmetry, permitting us to ignore the order of the operation's operands. 2.

+ and • are associative operations; e.g.,

Associative property:

X + (Y + Z) = (X + Y) + Z

and

X • (Y • Z) = (X • Y) • Z

Non-associative operations (such as subtraction and division) tend to cause difficulty precisely because they are nonassociative. The property of associativity permits selective omission of parentheses, since the order in which the operation is applied has no effect on the outcome; i.e., we can just as easily write X + Y + Z as X + (Y + Z) or (X + Y) + Z since the result is the same whether we first evaluate X + Y or Y + Z. 3.

Distributive property: over • ; e.g.,

• distributes over + and + distributes

X • (Y + Z) = (X • Y) + (X • Z) and also X + (Y • Z) = (X + Y) • (X + Z) With the distributive property we see a strong departure from the algebra of ordinary numbers which definitely does not have the property of + distributing over • . The distributive property illustrates a strong element of symmetry that occurs in Boolean algebras, a characteristic known as duality. 4.

Zero and one: there is an element zero (0) and an element one (1) such that for every X, X + 1 = 1

and

X • 0 = 0

Page 18 5.

Identity: e.g.,

0 is an identity for + and 1 is an identity for • ;

X + 0 = X 6.

and

Complement property: that __ X + X = 1 and

X • 1 = X

for every X

__ every element X has a complement X such

__ X • X = 0

The complement of 1 is 0 and vice-versa; it can be shown that in general complements are unique; i.e., each element has exactly one complement. 7.

Involution property (rule of double complements): == X = X

8.

Idempotent property: X + X = X

9.

and

Absorption property: X + (X • Y) = X

for each X,

for every element X, X • X = X for every X and Y, __ and X + ( X • Y) = X + Y

Anything "AND"ed with X is absorbed into X under "OR" with X. __ Anything "AND"ed with X is absorbed in its entirety under "OR" with X. 10.

DeMorgan property: for every X and Y, __________ __________ __ __ __ __ X • Y = X + Y and X + Y = X • Y The DeMorgan property describes the relationship between "AND" and "OR", which with the rule of double complements, allows expressions to be converted from use of "AND"s to use of "OR"s and vice-versa; e.g., __ __ ========== X + Y = X + Y = X • Y __ __ ========== X • Y = X • Y = X + Y

Some of these properties can be proven from others (i.e., they do not constitute a minimal defining set of properties for Boolean algebras); for example, the idempotent rule X + X = X can be obtained by the manipulation X + X = X + (X • 1) = X by the absorption property. The DeMorgan property provides rules for using NANDs and NORs (where NAND stands for "NOT AND" and NOR stands for "NOT OR"). The operation NAND (sometimes called the Sheffer stroke) is denoted by

Page 19 __________ X • Y = X ↑ Y and the operation NOR (sometimes called the Pierce arrow) is denoted by __________ X + Y = X ↓ Y Utilizing the rule of double complements and the DeMorgan property, any expression can be written in terms of the complement operation and ↑ or the complement operation and ↓ . Moreover, since the complement can be written in terms of either ↑ or ↓ ; i.e., __ X = X ↑ X = X ↓ X any Boolean expression can be written solely in terms of either ↑ or solely in terms of ↓ . This observation is particularly significant for a circuit whose function is represented by a Boolean expression, since this property of Boolean algebra implies that the circuit construction can be accomplished using as basic circuit elements only NAND circuits or only NOR circuits. Note that properties such as commutative and associative are also a characteristic of the algebra of numbers, but others, such as the idempotent and DeMorgan properties are not; i.e., Boolean algebra, the algebra of circuits, has behaviors quite different from what we are used to with numbers. Just as successfully working with numbers requires gaining understanding of their algebraic properties, working with circuits requires gaining understanding of Boolean algebra. In working with numbers, just as we often omit writing the times symbol × in formulas, we may omit the AND symbol • in formulas. Examples: 1. 2.

3.

There is no cancellation; i.e., XY = XZ does not imply that Y = Z (if it did, the idempotent property XX = X = X • 1 would imply that X = 1!) Complements are unique To see this just assume that Y is also a complement for X; i.e., – X + Y = 1 and XY = 0. __ __ __ __ st – AND the _ 1 equation through with X to get X X + Y X = X _ __ __ – Since X X = 0, this reduces to Y X = X __ __ – Similarly, since X + X = 1 and XY = 0, XY + X Y = Y reduces __ to X Y = Y __ – Putting the last two lines together we have X = Y The list of properties is not minimal; e.g., – Given that the properties other than the idempotent property are true, then it can be shown that the idempotent property __ is also true as follows: X + X __ = 1, so using the distributive property, XX + X X = X which in turn leads to

Page 20



__ XX = X since X X = 0 A similar argument can be used to show that X + X = X Given that the properties other than the absorption property are true, then it can be shown that the absorption property is also true as follows: Since 1 + Y = 1, X__+ XY = X, the 1st __absorption criteria Starting from X + X = 1 we get XY + X Y __= Y Adding X to both sides we get X + XY + X Y = X + Y By the __ first absorption criteria this reduces to X + X Y = X + Y, which is the 2nd absorption criteria

The DeMorgan property has great impact on circuit equations, since it provides the formula for converting from OR to NAND and from AND to NOR. The above proofs are by logical deduction. For a 2-element Boolean algebra, proof can be done exhaustively be examining all cases; e.g., we can verify DeMorgan by means of a "truth table": __________ __ __ __ __ X | Y X | Y | X • Y | X + Y| X + Y | | | | | 0 | 0 1 | 1 | 1 | 0 | 1 | | | | | 0 | 1 1 | 0 | 0 | 1 | 0 | | | | | 1 | 0 0 | 1 | 0 | 1 | 0 | | | | | 1 | 1 0 | 0 | 0 | 1 | 0 | | | | | This is called a "brute force" method for verifying the equation __________ __ __ X + Y = X • Y because it exhaustively checks every case using the definition of the AND, OR and NOT operations. Since we can write _____ and OR _____are ________associative, _____ _____________AND X • Y • Z and X + Y + Z unparenthesized. __________________ __ It can be shown that _X____•____ Y____ •___Z = X__ + __ and X + Y + Z = X •

__ __ Y + Z __ _ _ Y • Z

This leads to the "generalized DeMorgan property": ___ ___ ______________________ ___ – X1X2 . . . Xn = X1 + X2 + . . . + Xn ___ ___ ___ ______________________________ – X1+ X2+. . .+ Xn = X1 X2 . . . Xn which is often useful for circuits of more than 2 variables. There are multi-input NAND gates to take advantage of this property. WARNING: NAND and NOR are not associative.

Page 21 Consider the truth table: __ __ __ ________ X | Y | Z X | Y | Z| X • Y | | | | | 0 | 0 | 0 1 | 1 | 1| 1 | | | | | 0 | 0 | 1 1 | 1 | 0| 1 | | | | | 0 | 1 | 0 1 | 0 | 1| 1 | | | | | 0 | 1 | 1 1 | 0 | 0| 1 | | | | | 1 | 0 | 0 0 | 1 | 1| 1 | | | | | 1 | 0 | 1 0 | 1 | 0| 1 | | | | | 1 | 1 | 0 0 | 0 | 1| 0 | | | | | 1 | 1 | 1 0 | 0 | 0| 0 | | | | |

________ | Y•Z | | 1 | | 1 | | 1 | | 0 | | 1 | | 1 | | 1 | | 0 |

________ | X • Y•Z | | 1 | | 1 | | 1 | | 1 | | 0 | | 0 | | 0 | | 1 |

________ ______________ | X • Y • Z| X • Y • Z | | | 1 | 1 | | | 0 | 1 | | | 1 | 1 | | | 0 | 1 | | | 1 | 1 | | | 0 | 1 | | | 1 | 0 | | | 1 | 0 | |

(X↑(Y↑Z))≠((X↑Y)↑Z) ______________ It is evident that (X↑(Y↑Z)) ≠ ((X↑Y)↑Z) ≠ X • Y • Z ______________ Similarly (X↓(Y↓Z)) ≠ ((X↓Y)↓Z) ≠ X + Y + Z This means that care must be taken in grouping the NAND (↑) and NOR (↓) operators in algebraic expressions!

u

The other two common binary operations, XOR (⊕) and COINC ( ) are both associative. X | | 0 | | 0 | | 0 | | 0 | | 1 | | 1 | | 1 | | 1 | |

Y | | 0 | | 0 | | 1 | | 1 | | 0 | | 0 | | 1 | | 1 | |

Z 0 1 0 1 0 1 0 1

X⊕Y |Y⊕Z |(X⊕Y)⊕Z |X⊕(Y⊕Z) |X | | | | 0 | 0 | 0 | 0 | | | | | 0 | 1 | 1 | 1 | | | | | 1 | 1 | 1 | 1 | | | | | 1 | 0 | 0 | 0 | | | | | 1 | 0 | 1 | 1 | | | | | 1 | 1 | 0 | 0 | | | | | 0 | 1 | 0 | 0 | | | | | 0 | 0 | 1 | 1 | | | | |

uY |YuZ 1 1 0 0 0 0 1 1

| | | | | | | | | | | | | | | | |

1 0 0 1 1 0 0 1

u u

u u

|(X Y) Z |X (Y Z) | | | 0 | 0 | | | 1 | 1 | | | 1 | 1 | | | 0 | 0 | | | 1 | 1 | | | 0 | 0 | | | 0 | 0 | | | 1 | 1 | |

Generalized operations (multi-input) serve to reduce the number of levels in a circuit; e.g., a 3 input AND is a 1-level circuit for XYZ equivalent to the 2-level circuit (XY)Z: 2-level (XY)Z

1-level XYZ

X Y Z



X Y Z

Page 22 Canonical forms: Any combinational circuit, regardless of the gates used, can be expressed in terms of combinations of AND, OR, and NOT. The most general form of this expression is called a canonical form. There are two types: – the canonical sum of products – the canonical product of sums Formulating these turns out to be quite easy if the truth table for the circuit is constructed. For example, consider a circuit f(X,Y,Z) with specification: __ __ __ __ __ X | Y | Z f(X,Y,Z) || X Y Z | XYZ | XYZ | | || | | 0 | 0 | 0 1 || 1 | 0 | 0 | | || | | 0 | 0 | 1 0 || 0 | 0 | 0 | | || | | 0 | 1 | 0 0 || 0 | 0 | 0 | | || | | 0 | 1 | 1 0 || 0 | 0 | 0 | | || | | 1 | 0 | 0 0 || 0 | 0 | 0 | | || | | 1 | 0 | 1 1 || 0 | 1 | 0 | | || | | 1 | 1 | 0 1 || 0 | 0 | 1 | | || | | 1 | 1 | 1 0 || 0 | 0 | 0 | | || | | Note that __ __ __ __ __ f(X,Y,Z) = X Y Z + X Y Z + X Y Z Each of these terms is obtained just by looking at the combinations for which f(X,Y,Z) is 1. Each of these is call a minterm. There are 8 possible minterms for 3 variables (see below). Analogously, for the is 0 we get __ combinations __ __ __for__which f(X,Y,Z) __ __ __ f(X,Y,Z) = (X+Y+ Z )(X+ Y +Z)(X+ Y + Z )( X +Y+Z)( X + Y + Z ) Each of these terms is obtained just by looking at the combinations for which f(X,Y,Z) is 0. Each of these is call a maxterm. There are 8 possible maxterms for 3 variables (see below). The minterms and maxterms are numbered from 0 corresponding to the binary combination they represent. X

Y

Z

0.

0

0

0

1.

0

0

1

2.

0

1

0

3.

0

1

1

4.

1

0

0

5.

1

0

1

6.

1

1

0

7.

1

1

1

minterms __ __ __ XYZ __ __ X YZ __ __ X YZ __ X YZ __ __ XY Z __ XY Z __ XY Z XYZ

maxterms X+Y+Z __ X+Y+ Z __ __ X +Y+ Z __ __ X+ Y + Z __ X +Y+Z __ __ X +Y+ Z __ __ X + Y +Z __ __ __ X +Y +Z

Page 23 Note that the maxterms are just the complements of their corresponding minterms. Representing a function by using its minterms is called the canonical sum of products and by using its maxterms the canonical product of sums; i.e., __ __ __ __ __ f(X,Y,Z) = X Y Z + X Y Z + X Y Z is the canonical sum of products and __ __ __ __ __ __ __ __ f(X,Y,Z) = (X+Y+ Z )(X+ Y +Z)(X+ Y + Z )( X +Y+Z)( X + Y + Z ) is the canonical product of sums for the function f(X,Y,Z). The short-hand notation (Σ-notation) f(X,Y,Z) = Σ(0,5,6) is used for the canonical sum of products. Similarly the short-hand notation (Π-notation) f(X,Y,Z) = Π(1,2,3,4,7) is used for the canonical product of sums. Canonical representations are considered to be 2-level representations, since for most circuits a signal and its opposite are both available as inputs. A combinational circuit's behavior is specified by one of – truth table listing the outputs for every possible combination of input values – canonical representation of the outputs using Σ or Π notation – circuit diagram using logic gates Converting to NANDS or NORS: For a Boolean algebra, __ notice that – the complement X is given by (X↑X) – Since XY is given by the complement of (X↑Y) we have XY = (X↑Y)↑(X↑Y) __ __ ========== – By DeMorgan X + Y = X + Y = X Y = (X↑X)(Y↑Y) Hence, we can describe and equation using AND, OR, and complement solely in terms of NANDS using the above conversions. Similarly, for NOR we have the conversions __ - X = (X↓X) – X+Y = (X↓Y)↓(X↓Y) __ __ ==== – XY = X Y = X + Y = (X↓X)(Y↓Y) (By DeMorgan) By DeMorgan, a NAND gate

is equivalent to

__________ __ __ ( X • Y = X + Y)

and a NOR gate

is equivalent to

__________ __ __ ( X + Y = X • Y)

Page 24 Using these equivalences, an OR-AND (product of sums) combination can be converted to NOR-NOR as follows:

≡ NOR-NOR

OR-AND

Other equivalences to OR-AND that follow from this one are NAND-AND and AND-OR as follows:



≡ NAND-AND

NAND-AND

For the sum of products (AND-OR) we have the counterpart equivalences:

≡ NAND-NAND

AND-OR



≡ NOR-OR

OR-NAND

Page 25 At this point, if given a truth table, or a representation using Σ or Π notation, we can generate a 2-level circuit diagram as the canonical sum of products or product of sums. Similarly, given a circuit diagram, we can produce its truth table. This process is called circuit analysis. For example, recall that the circuit equation, __ __ f(A,B,C,D) = ((A B )↑C) ((A⊕C)↓ D )

u

was earlier represented as a 3-level circuit diagrammed by: __ AB A __ (A B )↑C B __ __ C f(A,B,C,D)= ((A B )↑C) ((A⊕C)↓ D A⊕C

.

.

u

)

__

(A⊕C)↓ D

D

From the circuit equation we can obtain the truth table as follows, conforming to the value given earlier __ __ __ __ __ A B C D f(A,B,C,D) A B (A B )↑C A⊕C (A⊕C)↓ D ((A B )↑C ) ((A⊕C)↓ D )

u

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1

0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0

1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1

0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0

0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1

0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1

From the truth table f(A,B,C,D) = Σ(1,5,10,15) = Π(0,2,3,4,6,7,8,9,11,12,13,14) Note that the canonical representations are not as compact as the original circuit equation. Circuit simplification: A circuit represented in a canonical form (usually by Σ or Π notation) can usually be simplified. There are 3 techniques commonly employed: – algebraic reduction – Karnaugh maps (K-maps) – Quine-McCluskey method

Page 26 Algebraic reduction is limited by the extent to which one is able to observe combinations e.g., __ __ potential __ __ __ __ in__examining __ __ the equation; __ A B C D + A BCD + AB C D = __ A B C D__ + A BCD + __A B C D + __AB C D (idempotent) = _ A (distributive) _ BD( C + C) __+ ( A + A)B C D = _ A (complement) _ BD • 1 __+ B C D • 1 = A BD + B C D (identity) This is a minimal 2-level for the circuit. The further __ representation __ algebraic reduction to ( A + C )BD produces a 2-level circuit dependent only on 2-input gates. The Quine-McCluskey method is an extraction from the K-map approach abstracted for computer implementation. It is not dependent on visual graphs and is effective no matter the number of inputs. Since it does not lend itself to hand implementation for more than a few variables, it will only be discussed later and in sketchy detail. For circuits with no more than 4 or 5 input variables, K-maps provide a visual reduction technique for effectively reducing a combinational circuit to a minimal form. The idea for K-maps is to arrange minterms whose value is 1 (or maxterms whose value is 0) on a grid so as to locate patterns which will combine. For a 1-variable map, input variable X, the minterm locations are as follows: X 0 1 __ X X While a 1-variable map is not useful, it is worth including to round out the discussion of maps using more variables. For a 2-variable map, input variables X, Y has minterm locations Y 0 1 X __ __ __ 0 X Y XY 1

__ X Y

XY

In general we only label the cells according to the binary number they correspond to in the truth table (the number used by the Σ or Π notations). The map structure is then: Y 0 1 X 0

1

2

3

0 1

Page 27 For example, if we have f(X,Y) = Σ(1,3), we mark the minterms for 1 and 3 in the 2-variable map as follows: Y

X

0

1 0

0 2

1

1 1

1

3

Now we can graphically see that a reduction is possible by delineating __ the adjacent pair of minterms (corresponding to X Y + XY), which in fact reduces to Y. Notice that there are visual clues: the 1 over the column corresponds to Y and the looking down vertically, the 0 and 1 "cancel". 2-variable K-maps also are not particularly useful, but again are illustrative. With 3-variables, the pattern is YZ 00 01 11 X

10

0

1

3

2

4

5

7

6

0 1 The key thing to note is that the order across the top follows the Gray code pattern so that there is exactly one 0-1 matchup between each column, including a match between the 1st and 4th columns. For the function f(X,Y,Z) = Σ(1,3,4,6), the K-map is X

YZ

00

01

0

1 4

1

11 1

0

10 3

2

7

6

1 5

1

1

__ __ f(X,Y,Z) = X Z + X Z __ The 1st term of the reduced form for f(X,Y,Z) is in the X row (flagged by 0) and the 2nd is in the X row (flagged by 1). In each case the Y term cancels since it is the one with 0 matched to 1. Pay particular attention to the box that wraps around.

Page 28 For a more complex example, consider f(X,Y,Z) = Σ(1,3,4,5) YZ

X

00

01

11 1

0

0

1

2

7

6

1 5

4

1

10 3

1

1

Here f(X,Y,Z) can__ be reduced to either of the following __ – f(X,Y,Z) = _ X Y _ Z + X_ _ __ – f(X,Y,Z) = X Z__ + X Y + Y Z Not that the term Y Z is "redundant" since its 1's are covered by the other two terms. The first expression is called a minimal sum of products expression for f(X,Y,Z) since it cannot be reduced further. For combinational circuits, the redundant term can be omitted, but sometimes in the context of sequential circuits, where intermediate values matter, it must be left in. With 4-variables, the K-map pattern is CD 00 01 11 10 AB 0

1

3

2

4

5

7

6

12

13

15

14

8

9

11

10

00 01 11 10

Now the Gray code pattern of the rows must also be present for the columns. More complex situations can also arise; for example, AB

CD

00

00

01

11 1

0

1 5

01 13

12

1

11

1

1

15

14

11

10

1 9

8

6

7

1

1

2

1 4

10

10 3

Page 29 describes f(A,B,C,D) = Σ(0,2,6,7,8,9,13,15). There are two patterns present that produce a minimal number of terms: AB

CD

00

00

01

11 1

0

3

5

7

13

12

1

15

01

5

13

12

1

11

Hence, either of the following produces expression: __ __ __ from the rows f(A,B,C,D) = A B D + __ __ __ from the columns f(A,B,C,D) = B C D +

1 15

14

11

10

1 9

8

1

10

6

7

1

10

2

1 4

1

10 3

1

14

11

11 1

0

01

1 9

8

00

6

1

1

11

CD

00

1 4

1

AB

2

1

01

10

10

1

a minimal sum of produces __ __ __ A BC + ABD + A B C __ __ __ A C D + BCD + A C D

In either case we know we have the function since all 1's are covered. When working with maxterms, the 0's of the function are what is considered. For the function above, f(A,B,C,D) = Π(1,3,4,5,10,11,12,14) and the K-map is CD 00 01 11 10 AB 0

00 4

01

0

11

0

2

5

7

6

13

15

0

0 12

14

0 8

10

3

1

0

9

10

11

0

0

leading to the following two minimal product of sums expressions: __ __ __ __ __ __ – f(A,B,C,D) = (A+B+ D )(A+ B +C)(A + B +D)( A +B+ C ) from the rows __ __ __ __ __ __ – f(A,B,C,D) = ( B +C+D)(A+C+ D )(B+ C + D )( A + C +D) from the columns. Be sure to observe that when working with maxterms, "barred" items correspond to 1's and unbarred items correspond to 0's, exactly the opposite of what is done when working with minterms. Just as a 4-variable K-map is formed by combining two 3-variable maps, a 5-variable K-map can be formed by combining two 4-variable maps

Page 30 (conceptually, 1 on top of the other, representing 0 and 1 for the 5th variable). In general, blocks of size 2n are the ones that can be reduced. are blocks of size 4 on a 4-variable K-map: CD CD 00 01 11 10 00 01 11 10 AB AB 1

0

3

2

0

1

3

2

4

5

7

6

15

14

11

10

00

00 5

4

7

6

01

01 13

12

1

11

1

15

9

8

11

1

11

00

01 0

11 1

3

2

5

7

CD

6

00

01

11 1

0

1

10 3

2

1

4

5

7

6

12

13

15

14

11

10

01

01 12

13

15

14

11

11 8

10

AB

00

1 4

1

f(A,B,C,D) = AD

10

1

9

1

f(A,B,C,D) = AB 00

1

8

10

10

CD

13

12

14

1

1

10

AB

Here

1

9

11

__ __ f(A,B,C,D) = B D

9

8

10

1

10

1

1

__ f(A,B,C,D) = B D

In each case, the horizontal term with 0 against 1 is omitted and the vertical term with 0 against 1 is omitted. Be sure to pay particular attention to the pattern with a 1 in each corner, where A is omitted vertically and C is omitted horizontally. Note that each block of 4 contains 4 blocks of 2, but these are not diagrammed since they are absorbed (in contrast, the Quine-McCloskey method, which we won’t look at until later, does keep tabs on all such blocks!). In general, an implicant (implicate for 0's) is a term that is a product of inputs (including complements) for which the function evaluates to 1 whenever the term evaluates to 1. These are represented by blocks of size 2n on K-maps.

Page 31 A prime implicant (implicate for 0's) is one not contained in any larger blocks of 1's. An essential prime implicant is a prime implicant containing a 1 not covered by any other prime implicant. A distinguished cell is a 1-cell covered by exactly 1 prime implicant. A don't care cell is one that may be either 0 or 1 for a particular circuit. The value used in K-map analysis is one which increases the amount of reduction. Don't care conditions occur because in circuits, there are often combinations of inputs that cannot occur, so we don't care whether their values are 0 or 1. General Procedure for Circuit Reduction Using K-maps 1. 2. 3. 4. 5.

6. 7. 8.

Map the circuit's function into a K-map, marking don't cares by using dashes Treating don't cares as if they were 1's (0's for implicates), box in all prime implicants (implicates), omitting any consisting solely of dashes. Mark any distinguished cells with * (dashes don't count) Include all essential prime implicants in the sum, change their 1's to dashes and remove their boxes - exit if there aren't any more 1's at this point. Remove any prime implicants whose 1's are contained in a box having more 1's (dominated case) – if there is a case where the number of 1's is the same (codominant case), discard the smaller box – if there is a case where the number of 1's is the same and the box sizes are the same, discard either. Go back to step 3 if there are any new distinguished cells Include the largest of the remaining prime implicants in the sum and go back to step 4 (this step is rarely needed) - if there is no largest, choose any If step 7 was used, choose from among the possible sums the one with the fewest terms, then the one using the fewest variables.

Remark: if this procedure is employed with the K-map CD 00 01 11 10 AB 1

0

00

5

4

13

12

1

11

1

6

1 15

14

11

10

1 9

8

step 7 will be employed.

7

1

1

2

1

01

10

3

1

Page 32 Worked out example: AB

CD

00

01

00 01

*1

5

1 7

6

15

14

1 1

1

*1 9

8

10

2

1

13

12

10 3

1

1 4

11

11 1

0

10

11

1

1

1

__ __ There are 2 essential prime implicants to put in the sum: A D + A D Now change the 1's in these 2 boxes to don't cares and redraw the map: CD 00 01 11 10 AB 1

0

00

-

1

-

10

1 7

6

15

14

1

9

8

2

13

12

11

5

4

01

3

11

1

-

10

-

The map has 2-sets of co-dominant implicants, so pick one of the codominant boxes from each and delete it; mark distinguished cells. CD 00 01 11 10 AB 00

*1

0

1

4

5

-

01 12

11

-

10

-

3

-

2

6

7

13

9

14

15

1* 8

1*

10

11

1*

-

Adding in the new essential prime implicants covers all 1's so __ __ __ __ f(A,B,C,D) = A D + A D + B D + A C

Page 33 We earlier considered the circuit analysis process, where given a circuit diagram, it can be converted into a circuit equation based on the gates employed, and from there converted into a truth table. The circuit design process proceeds as follows: 1. Formalize the problem statement into inputs and outputs, devising representations for inputs and outputs 2. Translate the problem statement to a logic function 3. Determine the outputs corresponding to inputs (some of which may be don’t cares) 4. Convert to Σ or Π notation (truth table optional), including any don’t cares – Example: if f(A,B,C) = 1 for 0,3,4 and 1,5 are don’t cares, then the circuit is given by either of f(A,B,C) = Σ(0,3,4) + d(1,5) or f(A,B,C) = Π(2,6,7) + d(1,5) 5. Create a K-map from the Σ or Π notation 6. Use K-map reduction to obtain a minimal circuit equation 7. Produce a circuit diagram from the circuit equation Employing XOR gates requires manipulation of the circuit equation. Employing NAND and NOR gates can be accomplished by adjusting the circuit diagram. [Recall that using the equivalences a NAND gate

is equivalent to

and a NOR gate

is equivalent to

there are diagrammatic techniques for converting sum of products and product of sums expressions to ones using NAND and NOR]. Example: (circuit design) Design a matching circuit for the following: There are 3 types of ball bearings in a bin (plastic, steel, and brass). An assembly machine needs ball bearings of each type at different points in the assembly process. Given the type of ball bearing it needs at present, it needs to look through the bin for a ball bearing matching the type; ie., Needed type Accept/Reject Observed type

Page 34 Step 1: Formalize Type Representation Plastic Steel Brass

01 10 11

Accept = 1 Reject = 0

Steps 2,3,4: Translate to logic function Needed obsrv’d A

B

C

D

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

f(A,B,C,D) = Σ(5,10,15) + d(0,1,2,3,4,8,12) = Π(6,7,9,11,13,14) + d(0,1,2,3,4,8,12)

d d d d d 1 0 0 d 0 1 0 d 0 0 1

Steps 5: K-map reduction CD 00 01 11 AB 1

0

00

-

01

-

11

-

10

-

4

12

8

*1

10 -

5

13

9

7

*1

2

3

15

11

AB

CD

00

00

-

01

-

11

-

10

-

-

10 3

2

5

7

6

0

0

* 13

12

14

11 1

4

6

* 1 10

01 0

15

0

*

*

9

8

0

14

0

11

0

*

__ __ __ __ Step 6: Circuit equation – f(A,B,C,D) = A__C + ABCD + B__D __ __ or f(A,B,C,D) = ( A +C)(B+ D )(A+ C )( B +D) Step 7: Circuit diagram (there are 2 obvious NORs) __________ __________ f(A,B,C,D) = A + C + ABCD + B + D

10

Page 35 __________ A + C A B

ABCD

f(A,B,C,D)

C D __________ B + D

Example: (circuit design) Design a combinational circuit to convert 3-bit Gray code to 3-bit binary (this is called a Gray to binary decoder). X Y Z

Gray in

X

A B C

X

Y

Z

A

B

C

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 0 0 0 1 1 1 1

0 0 1 1 1 1 0 0

0 1 1 0 1 0 0 1

YZ

00

01

11 1

0

0 1

1

1

X

1

X

2

1 6

7

1

YZ

00

01

11

10

0

1

3

2

4

5

7

6

0

3

5

4

A = Σ(4,5,6,7), B=Σ(2,3,4,5), C=Σ(1,2,4,7)

10

1

Binary out

1

K-map for B

YZ

1

1

00

0

11 1

10 3

1 4

1

1

1

01 0

1

2

1 5

K-map for A

7

6

1

A = X __ __ B = X Y + X Y = X⊕Y __ __ __ __ __ __ C = X Y__ Z +__ X Y Z + XYZ + ___ ___ __ __ __ X Y Z __ = (X Z + X Z) Y + (XZ+ X Z )Y = (X⊕Z) Y + ( X ⊕ Z )Y = (X⊕Z)⊕ Y = X⊕Y⊕Z Pay particular attention to the patterns that produced the XORs!

K-map for C

Page 36

Gray in

X

A

Y

B

Z

C

Binary out

Gray to Binary Decoder A Gray to binary decoder is an example of a circuit that could be packaged as a specialized circuit. As an example of a more complex decoder, consider the 7-segment display

a f

e

b g

c

d This are used to produce representations of decimal digits and (to a lesser extent) the hex characters A-F as follows: 0 _ | | |_|

1

2 _ | _| | |_

3 4 5 6 _ _ _ _| |_| |_ |_ _| | _| |_|

7 _

8 9 A B C _ _ _ _ | |_| |_| |_| |_ | | |_| | | | |_| |_

D

E _ _| |_ |_| |_

F _ |_ |

Pay particular attention to the difference between the representations for 6 and B (a common mistake is to interpret the B pattern as 6). Note that a logic circuit to convert 4-bit (hexa)decimal data to 7segment display format will require 7 outputs, one for each of segments a,b,c,d,e,f,g. If only a BCD conversion is needed, then the circuit is simplified (somewhat) because for the inputs for A,B,C,D,E,F, the outputs are don’t cares. The construction of such a circuit can be achieved by the means already covered, albeit with some tedium due to the number of outputs. The SN7447 chip is a BCD to 7-segment display decoder/driver (LED segments have to be protected from excess current, a capability built in to this chip so that it can directly drive LED segments without use of pull-up resistors). A worked out circuit diagram for this chip follows:

Page 37

SN 7447: BCD to 7-segment Display Decoder/Driver a BD f b __ e

g

AC

c

_ __ __ AB C D

d A

B

__ a

__ A

1

2

BD

A

_ ABC

__ B

_ ABC

__ b

B C

__ C

4

CD _ _ ABC

__ c

C D

__ D

8 D

__ AB C __ A BC

__ d

ABC

____ BI RBO

A Wired AND

__ BC

__ e

AB __ BC

__ f

__ AC D

____ LT ______ RBI

ABC

Lamp test

___ BCD

__ g

BI ≡ Blanking Input; RBO ≡ Ripple Blanking Output; LT ≡ Lamp Test; RBI ≡ Ripple Blanking Input Points marked ⊗ are take HIGH by taking the blanking input line LOW (this forces all outputs HIGH)

Page 38 BI (Blanking Input), RBI (Ripple Blanking Input), and LT (Lamp Test) have no effect if they are not connected or if their lines are held HIGH. –

If the blanking input is taken LOW, a 1 is forced at each point marked ⊗, in effect blanking all LED by taking their lines high



Taking the lamp test input LOW forces the internal lines representing A,B,C to go LOW, which internally produces the same effect as an input of numeric 0 or 8, thus enabling LED lines a,b,c,d,e, and f. LED line g requires an additional enable via the internal lamp test line.

Taking the ripple blanking input line LOW enables the six input NAND gates _ _ __ in __ the __ circuit to respond to the internal lines representing A , B , C , D , which will then cause the blanking of the LEDs if the numeric value of the input is 0. To suppress leading 0’s in a sequence of digits, the blanking input line for each digit is used as an output (Ripple Blanking Output) connected to the ripple blanking input line of the digit of next lower order (note that as soon as a non-zero digit occurs in the sequence, it produces a HIGH signal on RBO which will then cause ripple blanking to be disabled for all subsequent lower order digits). Careful examination of the circuit shows that segment a is not lit for the number 6! –

BCD to 7-segment display function table: __ __ __ __ __ __ __ D C B A a b c d e f g 0 0 0 0 0 0 0 0 1 1 -

0 0 0 0 1 1 1 1 0 0 -

0 0 1 1 0 0 1 1 0 0 -

0 1 0 1 0 1 0 1 0 1 -

0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 - (non-BCD input

0 0 1 1 1 1 Remark: the SN7447 0 1 0 display pattern 1 1 0 for 6 is given by 1 0 0 |_ 1 0 0 |_| 0 0 0 1 1 1 0 0 0 1 0 0 - - combinations are all don’t cares)

Standard K-map analysis results in the following equations: __ __ __ __ __ [BD added in from don’t cares for blanking output a = A B C D + A C + BD __ __ __ [BD added in from don’t cares for blanking output b = A B C + A BC + BD __ __ __ [CD added in from don’t cares for blanking output c = A B C + CD __ __ __ __ __ d = ABC + A B C + A B C __ __ e = A + BC __ __ __ __ f = AB + B C + A C D __ __ __ __ g = ABC + B C D

purposes] purposes] purposes]

Page 39 Arithmetic circuits: Half adder – 2-bit addition is accomplished by XOR. A circuit for 2bit addition that outputs both the sum (S) and carry (Cout) is called a half adder (a full adder also accounts for an input carry from a prior addition (Cin) X

Y

S

Cout

0 0 1 1

0 1 0 1

0 1 1 0

0 0 0 1

X Y

S Cout Half adder (HA)

Full adder – To accommodate an input carry we have X

Y

Cin

S

Cout

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 1 0 1 0 0 1

0 0 0 1 0 1 1 1

S = X ⊕ Y ⊕ Cin by the Gray to binary __ Cout = X_YC _ in + _XYC _ in = ( X Y + X Y )Cin

the same analysis used for the C output variable of decoder discussed earlier. __ __ + X Y Cin + XY __ C in which reduces nicely to + XY(Cin + C in) = (X ⊕ Y)Cin + XY

Both (X ⊕ Y)Cin and XY are produced by two half adders arranged as follows: X Y

S

Cin XY

(X ⊕ Y)Cin

Hence to get a full adder (FA) we simple use two half-adders with an OR gate applied to the two carries: X Y Cin

HA

HA

S Cout

Page 40 4-bit parallel adder: Input is two 4-bit quantities (X3,X2,X1,X0) and (Y3,Y2,Y1,Y0). Input corresponding digits to each full adder circuit and propagate each carry out to the carry in of the next higher full adder. X3 Y3

X2 Y2

X1 Y1

X0 Y0

FA

FA

FA

FA

Cout

S3

S2

S1

Cin

S0

It is evident that this technique can be extended for multiple bits. The major drawback to this circuit construction is the fact that the carry propagation must go through many circuit levels to reach the high order bit. For this reason, adders may employ carry anticipation; for example, for a 2-bit adder, the Cout value can be determined combinationally by examining its specification or simply employing logic; i.e., Cout is given by (X1 AND Y1) OR [carry out via X1 and Y1 alone] ((X1 OR Y1) AND X0 AND Y0) OR [carry out via carry in from 1st FA] ((X1 OR Y1) AND Cin AND (X0 OR Y0) Multiplier: Input is two 3-bit quantities (X2,X1,X0) and (Y2,Y1,Y0). Think in terms of the construction X1 X0 X2 Y2 Y1 Y0

X2Y2

X2Y0 X1Y0 X0Y0 X2Y1 X1Y1 X0Y1 X1Y2 X0Y2

X2Y0 +2 . . .

+2 X0Y0

where +2 is the binary addition accomplished by a full adder. The number of gates for this kind of construction is the reason multiplication circuits may use sequential circuit techniques (to be covered later). Subtraction: Full and half-subtractors can be constructed analogously to full and half-adders. Half subtractor – 2-bit subtraction is also accomplished by XOR. A circuit for 2-bit subtraction that outputs both the difference (D) and borrow (Bout) is called a half subtractor (a full subtractor also accounts for an input borrow from a prior subtraction (Bin) X

Y

D

Bout

0 0 1 1

0 1 0 1

0 1 1 0

0 1 0 0

X Y

D Bout

Half subtractor (HS)

Page 41 Full subtractor – To accommodate an input borrow we have X

Y

Bin

D

Bout

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 1 0 1 0 0 1

0 1 1 1 0 0 0 1

D = X ⊕ Y ⊕ Bin by the same analysis used for the C output variable of the Gray to binary decoder discussed earlier. __ __ __ __ __ Bout = X__Y__ Bin + X Y B in + __X YBin + XYB ____ ___reduces _ __ in which __nicely to = ( X Y + XY)Bin + X Y(Bin + B in) = ( X ⊕ Y )Bin + X Y ____ _____ __ Both ( X ⊕ Y )Bin and X Y are produced by two half subtractors arranged as follows: X Y

D

Bin

________ ( X⊕Y)Bin

XY

Hence to get a full subtractor (FS) we simple use two half-subtractors with an OR gate applied to the two borrows: X

HS

Y

D

HS

Bout

Bin

4-bit parallel subtractor: Input is two 4-bit quantities (X3,X2,X1,X0) and (Y3,Y2,Y1,Y0). Input corresponding digits to each full subtractor circuit and propagate each borrow out to the borrow in of the next higher full subtractor.

Bout

X3 Y3

X2 Y2

X1 Y1

X0 Y0

FS

FS

FS

FS

D3

D2

D1

Bin

D0

Page 42 Just as for the adder circuit, it is evident that this technique can be extended for multiple bits. Note that the difference between the adder and subtractor circuits is in how the propagated signal is dealt with (whether carry or borrow). BCD adder: Recall that BCD addition required adding 6 if the sum exceeded 9. A BCD adder can then be formed by combining a 4-bit binary adder with circuitry to make the adjustment when the sum exceeds 9. Note that the test for 9 or greater is R3•(R2+R1+R0). X 3 Y3 X2 Y2 X1 Y1 X0 Y0

carry in

4-bit binary adder

R3

R2

R1

R0

carry out HA

FA

HA

S3

S2

S1

(add 6) test for result > 9 S0

BCD Sum Note that when the “exceeds 9” test is 0, the HA,FA,HA combination simply adds in 0, which has no effect on the sum; otherwise, 011 is added to R3R2R1 , in effect adding 6. Other specialized circuits: AOI gates: (AND-OR-Invert)

__ Suppose you have an expression such as (A + B + C)(A + B). Then double-inverting and applying the DeMorgan property, this becomes __ __ __ __ __ (A + B + C)(A + B) = ( A B C )+ ( A B ) which is an AND-OR-Invert expression. Hence AOI gates are employed to implement product of sums expressions. A 2-wide, 3-input AOI gate has the form:

Page 43 Decoders/demultiplexers: Both the Gray to binary decoder and BCD to 7-segment display decoder/driver constructed earlier are cases of a class of circuits called decoders and demultiplexers. Basically, a decoder translates input data to a different output format. Of particular interest is a decoder that decodes an “input address” to activate exactly one of several outputs. In particular, a 1 of 2n decoder is one for which exactly one of 2n output lines goes High in response to an n-input address. If there is a data input line also, and the selected output matches the data input, then the circuit is called a demultiplexer. Example 1: 1 of 8 demultiplexer Data in

Address in

0 1 2 3 4 5 6 7

1 2 4

Addressed outputs

In essence a demultiplexer routes the input data to the addressed output. Example 2: Constructing a 1 of 16 decoder/demultiplexer from two 1 of 8 decoder/demultiplexers Decoder/demultiplexers usually include a “chip select” or “enable” input to activate/deactivate the circuit. With an enable input a larger decoder/demultiplexer can be constructed from smaller ones; for example, a 1 of 16 decoder/demultiplexer can be constructed from two 1 of 8 decoder/demultiplexers as follows: Data in

1 2 4

CS

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

Addressed outputs

1

Address in

2 4 8

1 2 4

CS

0 1 2 3 4 5 6 7

8 9 10 11 12 13 14 15

This kind of construction is very useful for addressing memory.

Page 44 A 1 of n decoder can also be used to directly implement a logic function. For example, the specification f(X,Y,Z) = Σ(2,5,6) can be implemented using a 1 of 8 decoder by

X Y Z

0 1 2 1 of 8 3 decoder 4 5 1 6 2 7 4

f(X,Y,Z) = Σ(2,5,6)

Internally, a decoder simply uses AND gates to produce the desired outputs; e.g., a 1 of 4 decoder has the construction 0 1 2

Address in

Addressed outputs

3 So the circuit implementation for f(X,Y,Z) as implemented above is just a sum of products (in fact, the canonical form since it is just minterms OR’ed together). Multiplexers: A multiplexer circuit is the inverse of a demultiplexer and is even more useful for implementing logic circuits because it does not require OR’ing of outputs. An 8 input multiplexer has the form

Data in

Address in

0 1 2 3 4 5 6 7

Output Output

1 2 4

CS

For a multiplexer, the address refers to the input lines. The output value is that of the addressed input. Normally, both a chip select line and the complement of the output are also provided.

Page 45 A 4 input multiplexer (MUX) has the construction:

0 1 2

Output

3

Output

Address in The basic addressing strategy is the same as for a decoder, but for a multiplexer the AND gates are also used to enable (or suppress) input values. Chip select is not implemented above, but can be accomplished by increasing the input capacity of each AND gate, attaching the chip select line to each AND. The OR gate that had to be supplied externally when using a decoder to implement a logic function is now incorporated into the construction. Implementing a logic function using a multiplexer is best illustrated by an example. Suppose that the specification f(A,B,C,D) = Σ(0,2,3,11,14) is what is given. f(A,B,C,D) can be implemented using an 8-input multiplexer as follows: A 0 1 2 3 4 5 6 7

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

B 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

C 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

D 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

__ D

0

1

1

1

0

2

0

0

3

0

4

D

5 6

D

0 __ D

0 __ D

A B C

1

f(A,B,C,D) 1 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0

__ D

0

f(A,B,C,D) __ f

0 7

2 4

CS

Note that columns A,B,C select 0,1, ..., 7 in pairs, each of which __ corresponds to one of 0,D, D ,1 on the output side. This provides a mapping from the truth table to an 8-input MUX as indicated. The SN74151 chip is an 8-input MUX commonly used for this purpose.

Page 46 Comparators: A comparator takes two input values and reports them as . Starting from the most significant bit, the comparator cascades comparisons until corresponding bits are found that are different (the limiting case is all bits are equal). The first occurrence of corresponding bits that are different determines whether the output should be > or

=


>


The circuit allows for cascading of comparators, where input from a comparator testing higher order bits may have already determined the outcome. Tracing the circuit strategy as indicated in the annotation shows that it implements the approach sketched out above.

Page 47

More specifically, the comparator as given is based on standard comparison logic; i.e., case: 1. the "" input line is 1 (the outcome is already ">" based on higher order bits) then the "" output line will be 1

3.

the "=" input line is 1 (the higher order bits are all "=", so the comparison depends on lower order digits) then if A3 < B3 OR A3 = B3 AND A2 < B2 OR A3 = B3 AND A2 = B2 AND A1 < B1 OR A3 = B3 AND A2 = B2 AND A1 = B1 AND A0 < B0 then the "" output line will be 0 else if A3 > B3 OR A3 = B3 AND A2 > B2 OR A3 = B3 AND A2 = B2 AND A1 > B1 OR A3 = B3 AND A2 = B2 AND A1 = B1 AND A0 > B0 then the "" output line will be 1 else (the result must be "=") the "" output line will be 0

Particular attention should be given to how the logic has been implemented in the circuit diagram. Contrast this to an approach that seeks to work from a truth table specification to a minimal sum of products or product of sums solution.

Page 48 Quine-McCluskey procedure: (optional non-graphical approach to reduction) As the number of variables increases, the K-map graphical reduction technique becomes increasingly problematic. The Quine-McCluskey procedure is an algorithmic alternative best employed for computer implementation and is covered for completeness. Step 1: Lay out the minterms in groups having the same number of 1’s, groups ordered by increasing numbers of 1’s. This is a listing of all blocks of 1. Step 2: Compare each group to the one immediately below it to form all blocks of 2. Flag each block of 1 when it is used in forming a block of 2. Repeat this process on the blocks of 2 to form all possible blocks of 4, then blocks of 8, and so on. Flag each block when it is used to form a larger block. Any blocks not used in forming larger blocks are carried forward to step 3. Do not list any blocks formed redundantly (e.g., a block of 4 occurs has 4 blocks of 2 and so can be formed 2 different ways) Illustration: A

B

C

D

f

blocks of 1

blocks of 2

blocks of 4

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 0 1 1 1 1 0 0 1 0 1 1 1 1 1 1

1)

0000 *

1) -0-0 2) --00

2) 3) 4)

0010 * 0100 * 1000 *

1) 00-0 * 2) 0-00 * 3) -000 *

5) 6) 7) 8)

0011 0101 1010 1100

* * * *

9) 10) 11)

1011 * 1101 * 1110 *

12)

1111 *

4) 5) 6) 7) 8) 9)

001-010 010-100 10-0 1-00

* * * * * *

10) 11) 12) 13) 14) 15)

-011 -101 1011-10 11011-0

* * * * * *

16) 1-11 * 17) 11-1 * 18) 111- *

3) -014) -105) 1--0 6) 1-17) 11--

Page 49 Step 3: Form the table of minterms and blocks from the first 2 steps. Mark each minterm participating in a block in the corresponding rowcolumn as illustrated below. Any column with a single entry is essential. Continuing with the example we have: 0000 0010 0011 0100 0101 1000 1010 1011 1100 1101 1110 1111 -0-0

*

__ --00 B C -01__ B C -10-

*

*

* *

*

*

*

*

* *

*

*

*

1--0

* *

1-1-

* *

11--

*

*

*

* *

*

*

*

*

*

Step 4: Remove the rows associated with essential entries along with any columns intersected by one or more of these rows. Put the terms representing the rows into the final sum. If 2 rows are identical, first eliminate based on dominance (number of 1’s), next arbitrarily. Repeat Steps 3 and 4 until all rows are used. In the example, all rows get removed the 2nd time step 4 is used. 0000 1000 1110 1111 __ __ CD

-0-0

*

*

--00

*

*

1--0 AB

*

Identical rows (remove 1 arbitrarily) *

1-1-

*

*

11--

*

*

Identical rows (remove 1 arbitrarily)

__ __ __ __ f(A,B,C,D) = AB + B C +B C + C D Repeat Steps 3 and 4 until all rows are used. Note that in the example, all rows get removed the 2nd time step 4 is used. Step 5: When an identical row is removed arbitrarily in Step 4 (no dominance), repeat the process for the alternate case - all combinations of duplicate row elimination should be explored and the minimal expression for each case generated. The user can then select from among these (which may provide additional possibilities for combinations. In the above __ __example, B⊕C is __ __present in the result given. Alternatively, B D can replace C D and AC can replace AB).

Page 50

Logic level: Sequential Logic Sequential logic addresses circuits that have current-state, nextstate behavior; ie., are of the form: Inputs

Outputs

Combinational Circuit

Current State

Storage Elements

Feedback Loop

Next State

Sequential Circuit The storage elements provide current state inputs, which together with external inputs are the inputs for a combinational circuit whose outputs provide the external outputs for the sequential circuit and the next state (to be captured in the storage elements to form a “feedback loop”). The circuit is clocked in the sense that the circuit only changes state when a “clock” signal is received; ie., the next state output is captured in the storage elements (to become the current state) only on a clock pulse, typically on a clock transition from 0 to 1. A state diagram is used to specify the current-state, next-state behavior of a circuit. If there are 2 inputs, then for each state, there are up to 4 possible next states that must be specified. The fundamental circuit have current-state, next-state behavior is called a flip-flop. A flip-flop has 2 stable states (0 and 1); ie., it is bistable. It stores a single bit of information and maintains its state as long as power is supplied to the circuit. State change occurs only in response to a change in input values. Types of flip-flops differ as to the number of inputs and how the inputs affect the state of the device. The most basic type of flip-flop is called a latch. Latches can be used to store information, but are subject to race conditions (the latch has a “setup time”, during which there may be an output value that is wrong, which may race to a another part of the circuit and cause a transition that should not occur – this is not an issue for combinational circuits so long as they are not being used in a sequential context).

Page 51 Set-Reset latches: The SR-latch formed from NOR gates is one of the fundamental latches that can be formed from basic logic gates. It has the construction: R

Q

__ Q S Each NOR gate’s output is fed back to the other’s input. SR stands for “Set-Reset”. The behavior (characteristic table) can be tabulated by __ S R Q Qnext Q next 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 0 0 1 1 0 0

1 0 1 1 0 0 0 0

no change reset to 0 active transitions occur

set to 1 unstable

The state diagram for an SR flip-flop is given by 00,10

00,01 10 0

1 01

Valid inputs are 00,01,10 If _ _ __ NAND gates are used instead of NOR, the result is called an __ S R -latch. S Q

__ R

__ Q

The reason for this becomes__clear __if the behavior is tabulated against S and R inputs rather than S and R . Note that the behavior duplicates

Page 52 that of the SR latch, except for the invalid case; ie., the characteristic table is: __ S R Q Qnext Q next 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 0 0 1 1 1 1

1 0 1 1 0 0 1 1

no change reset to 0 active transitions occur

set to 1 unstable

It is instructive to examine timing considerations for the two cases where there are transitions as the latch sets up the new output values. Assume that it takes a discrete time interval ∆t before the output of a gate registers so our viewpoint for the latch is R

S

∆t

Q

∆t

__ Q

__ Taking “snapshots” of Q and Q at time intervals 0, ∆t, 2∆t we get __ S R Q Q elapsed time Active Reset to 0

0 0 0

1 1 1

1 0 0

0 0 1

0 ∆t 2∆t

-- response to R = 1 sets Q __ to 0 -- response to Q = 0 sets Q to 1

Active Set to 1

1 1 1

0 0 0

0 0 1

1 0 0

0 ∆t 2∆t

__ -- response to S __ = 1 sets Q to 0 -- response to Q = 0 sets Q to 1

In both instances it takes 2∆t for the circuit to stabilize. Flip flops are usually handled synchronously with inputs held at the the no change state until a clock pulse occurs. A gate can be used for this purpose, for example, with an AND gate: input clock If clock = 0 then output = 0.

output If clock = 1, then output = input.

Page 53 The basic SR-latch has no provision for clock input and so is configured for asynchronous usage. Note that since Qnext is a function of S,R,Q we can derive a next state equation as follows: Qnext = f(S,R,Q) = Σ(1,4,5) + d(6,7) from the earlier tabulation and the K-map is RQ 00 01 11 10 S 0

0

1

3

2

5

7

6

1 4

1

1

1

-

-

__ so Qnext = S + R Q __ __ To add a control (or enable), the S R -latch turns out to be the most natural underlying latch because it responds to inverted inputs: ____ CS

S

Q C ____ CR

__ Q

R Separate preset and preclear lines can be added to allow flip-flop initialization without using the controlled inputs. Preset ____ CS

S

Q C ____ CR

__ Q

R Preclear Clock signals typically are produced in the form of square waves

or regularly spaced pulses

Page 54 Edge-triggered flip-flops: There is a “voltage setup time” when the signal changes from 0 to 1. The edge of the pulse for the 0 to 1 transition is called the leading edge, and for the 1 to 0 transition the trailing edge. leading edge

trailing edge

Voltage setup interval

An edge-triggered flip-flop changes state when the edge is reached. The value of the flip-flop remains constant until the next edge is reached. There are leading edge triggered and trailing edge triggered flip-flops. Normally all flip-flops in a circuit should trigger on the same edge. If types are mixed, the leading edge can be converted to trailing edge by inverting the control input (and vice-versa). Flip flops are designated by the symbols

the first for leading edge triggered and the second for trailing edge __ triggered. marks the control input. Note that the output for Q is marked as well. For example, the flip-flops on the SN7473 are trailing edge triggered and those on the SN7474 are leading edge triggered. Master-Slave flip-flops: A master-slave flip-flop combines two flip-flops (with controls) where the “master flip-flop” triggers on the leading edge. The “slave flipflop” then triggers on the trailing edge in response to the values of the master flip-flop. S Q

Q

__ Q

__ Q

C

R Master ff: leading edge triggered

Slave ff: trailing edge triggered

Page 55 There are two virtues to this construction: 1. the overall output does not change while the control input is high, since the overall output comes from the slave flip-flop, which sets up only when the control input goes low 2. the slave flip-flop is isolated from the rest of the circuit, responding only to the master flip-flop’s value. (without this kind of protection in a circuit with multiple interconnected flip-flops, a race condition may occur, where an intermediate value gets latched rather than a final value). From the external view point, the master-slave flip-flop triggers on the trailing edge. A note on latches: Although basic latches should be avoided when a circuit requires multiple flip-flops, basic latches still have uses. Example: debouncing a switch the mechanical nature of a physical switch precludes a smooth transition between 0 and 1 when the switch is opened or closed. This phenomenom is called bounce, because the switch value may haphazardly alternate between open and closed as the switch contacts separate on opening or connect on closing. It is a simple application to debounce a “single pole, double throw switch” using a basic latch; eg., Vcc

__ S Q

SPDT switch __ R

__ Q

GND The two resistors are needed to prevent a short circuit between Vcc and GND for the input connected through the switch (they are called “pull-up resistors” because when connected between Vcc and GND, they pull the voltage on the Vcc side of the resistor up to logic 1). When the switch as shown above is thrown to its opposite position, __ the flip-flop will set to 1 the first time 0 detected on S , and __ will hold that value because __ __if a bounce takes S back to 1, the effect is applying 1,1 on R , S which is the no-change state of the flip-flop (ie., the flip-flop can’t revert to its prior value). Generally, the term “latch” is only used in reference to flip-flops whose outputs are not protected from intermediate values while setting up. Unless qualified by the term “latch”, the use of the term “flip-

Page 56 flop” normally refers to a leading or trailing edge triggered flip-flop that is protected. The master-slave construction is one approach used for producing flip-flops. The SN7473 and SN7476 are in this category. An aside about electricity: The example of debouncing a switch may arouse curiousity regarding use and selection of resistors with TTL integrated circuits such as the SN7400 (quad 2-input NAND chip). Selection requires the application of a small amount of knowledge about voltage, resistance, and electric current. Ohm’s Law: Ohm’s Law is the relationship between electromotive force E (measured in voltage, symbolized by V), current I (measured in Amperes, symbolized by A), and resistance or impedance R (measured in Ohms, symbolized by Ω); namely, E = IR This is closely related to Joule’s Law of power (measured in Watts, symbolized by W); namely, P = EI Current is the rate of flow of electric charge in a circuit and is measured in electron charge. By international standard, 1 Ampere of current is defined as the flow of 6.24 × 1018 electron charges (called a Coulomb) per second. It’s rather bizarre value is derived from the number of atoms in a gram of Carbon. Note that the relationship between current and resistance is I=E/R, so current is inversely proportional to resistance at constant voltage. When plotted, the curve is I

R 1

10

The area under the curve is given by multiplying current by resistance; ie., it represents voltage. It is also given by the natural logarithm, as discussed in calculus classes. Standard resistor values: If the curve is scaled by the inverse of the natural logarithm of 10 (1/ln(10)=.4343), the area is given by the base 10 logarithm and consequently the area between 1 and 10 is 1V. Manufacturers have chose to use impedance values that equally divide this area into 6, 12, or 24 equal subareas (the E6, E12, and E24 series). 1/6 = .166667=p and the impedance values are then 100=1, 10p=1.468, 102p=2.154, 103p=3.162, 104p=4.642, 105p=6.813. The values adopted for the E6 resister series are 1.0, 1.5, 2.2, 3.3, 4.7, 6.8, which approximate the above calculations. Resistors are chosen whose Eseries value is a close match for the value needed. For example, if

Page 57 a 50000Ω resistor is needed, then 47KΩ is used from the E6 series or 51KΩ from the E24 series. If a 47KΩ resistor is used in the debouncing circuit above, and Vcc is at +5V, the current flow is I = 5/47000, which is .000106 Amps or 0.106 mA, where mA designates milliamps. TTL draws no more than .04mA for 1 to be detected at an input; ie., 47KΩ resistors are commonly used as pull-up resistors when working with TTL chips. Using a higher resistor value reduces the current draw (and thus, power consumption) but the circuit may fail to work if the power at the input is inadequate. Batteries: Batteries have an internal impedance which varies according to battery size and type. As a battery is used, its impedance grows, reducing power output. Alkaline batteries: A fully charged 1.5V alkaline cell will have an impedance of about 0.32Ω, which means that the limiting current between terminals is 4.7A. NiCad batteries: NiCad batteries in contrast have about half the capacity (stored energy) of alkalines, but hold their voltage relatively constant during discharge (alkalines lose voltage linearly). The basic NiCad cell is 1.2V and when fully chargned has an impedance of about 0.12Ω. yielding a maximum current of about 10A; ie., NiCad batteries can supply power at about twice the rate of alkalines and so are used in more power hungry applications. Following Joule’s Law, Amps × Volts × time = Watt hours is used as a measure of power consumption. It follows that an alkaline cell can provide up to 7 Watts and a NiCad cell up to 12 Watts of power. Battery capacity is usually measured in Amp hours rather than Watt hours. Batteries in series: Putting batteries in series increases electrical potential additively; ie., two alkaline cells in series produces a 3V battery. Impedance also doubles, so there is no change in maximum discharge characteristics. Batteries in parallel: If batteries are placed in parallel, then the voltage is unaffected and the impedance is changed according to R2/2R = R/2. For 2 alkaline cells this is 0.16, increasing the discharge maximum to 9.4A or doubling its current capacity. This assumes that the batteries are matched. Note that in parallel, a weak battery will tend to discharge its companions, since Mother Nature seeks balance. Alternating current: Batteries produce direct current (DC) with current flow in one direction (source to ground). A current for which the current flow reverses direction cyclically is called alternating current (AC) and

Page 58 is produced by rotating a wire coil through a magnetic field. Magnets have poles (+ and -), so if the coil is first oriented + -, after a 180° rotation it will be oriented - + and the induced voltage will reverse. If the rotation is constant then the voltage will follow a sinusoidal pattern. In the US, the AC standard for house wiring is 60 cycles per second alternating between -120V and +120V. AC is used because it is relatively efficient to transform it to high voltage for transmission (which requires less current flow to move the same amount of power). Of course it has to be transformed back to safer levels for use in the home. Devices called rectifiers are used to convert AC power to DC. House current can be converted by using both a transformer and a rectifier to produce a DC output that can be used in place of a battery (just be sure that the voltage is correct for the use intended). A 60 Watt 120V light bulb requires 60/120 = 0.5A. Circuit capacity is limited by the amount of current the transmission wire can handle before its natural resistance causes overheating (and failure). Increasing wire diameter, or braiding together multiple wires, reduces resistance and increases capacity. To protect the transmission wire, a fuse is used to keep from overloading the circuit. A 20Amp 120V circuit can handle a load of 2400 Watts (ie., two 1500 Watt hair dryers will blow the fuse). D-latches and D flip-flops: A D-latch (D for delay) has the form: D Q C __ Q

__ It is the (clocked) SR-latch with S fed to the R input. Hence, it triggers on the leading edge. Obviously, the same minor modification applied to the master-slave SR flip-flop covered earlier will produce a master-slave D flip-flop. The value of a D flip-flop is just the input, but one cycle behind (hence the term delay). It should be noted that a D flip-flop has only one input. An alternative construction of a D-latch: A Tri-state Buffer is a is a gate whose output can be in one of three states, 1, 0, or null (same as no contact). It has the form Ctrl Input

Output

When the Ctrl = 1 then Output = Input; when Ctrl = 0, Output = null.

Page 59 Tri-state buffers can be used to construct a D-latch as follows: Clock Q

__ Q

D

When the clock value goes high, output Q = input D; ie., the latch is leading edge triggered. Either this construction or the NAND construction produces a viable D-latch. Two D flip-flop constructions based on D-latches are as follows: D-latch (master)

D

D

ck

ck

Q

D-latch (slave) D

Q

Q

Q

Q

ck Q

Master-Slave D flip-flop The master-slave construction works with either version of the D-latch since both trigger on the leading edge. The overall construction is a trailing edge triggered flip-flop. __ __ The next construction uses 3 S R latches cleverly to produce a leading edge triggered D flip-flop. By inverting the clock input, the masterslave version can be converted to leading edge triggered, but it requires more logic gates.

Page 60

D

D0

Q

x

ck y

D

D0 = D x,y = 1 x = D0 y = D0

when when when when

ck ck ck ck

= = = =

0 0 1 1

Q

Leading Edge Triggered D Flip-flop When ck = 0, both x and y are held at 1, the no change state__for the right-most latch. At the same time the upper latch outputs D and feeds it to the lower latch to produce D internally. When the clock __ rises to 1, D is latched at y__ and D at x, to be latched by the rightmost flip-flop as Q and Q . If D is changed while ck = 1 and x has latched 0, there is no effect. If x = 1, then y = 0 blocks any change in D from affecting x (the purpose of the feedback from the lower to the upper latch) and also prevents the feedback from the upper latch from affecting the lower latch. Hence, the flip-flop latches the value on the leading edge. In effect, the flip-flops in the circuit set up based on the values from the prior clock cycle, and so all inputs are stable each time the triggering edge is reached. Other flip-flops: While the SR-latch has uses in practice, the SR flip-flop does not because it does not make use of a 1,1 input. As we have seen, a D flip-flop uses a single input (other than ck). A T flip-flop also uses an single input and simply “toggles” the state when the input is 1. A JK flip-flop combines the SR flip-flop and toggles when inputs are 1,1. T flip-flop: When T=0, the flip-flop values are unchanged. When T=1, the next state is the opposite of the current state. Hence, the characteristic table for the flip-flop is given by:

Page 61

T

Q

Qnext

0 0 1 1

0 1 0 1

0 1 1 0

toggle when T = 1

__ __ Qnext = T Q + T Q = T ⊕ Q JK flip-flop: This flip-flop just combines the functions of the SR and T flipflops and so is widely used. Its characteristic table is given by J

K

Q

Qnext

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 0 0 1 1 1 0

no change reset to 0 set to 1 toggle

__ __ K-map analysis shows that Qnext = J Q + K Q. Excitation controls: In a circuit, flip-flop inputs have to be set to produce desired nextstate behavior. This is trivial for the D flip-flop. For the JK flipflop excitations are given by Qpresent Qnext

J

K

0

0

0

d

0

1

1

d

1

0

d

1

1

1

d

0

Excitation: J,K values as a function of Q and Qnext

Any flip-flop has present-state, next-state capabilities, so any flipflop type can be produced from any other flip-flop type. Example: A T flip-flop from a JK flip-flop J

Q

K

__ Q

T

ck

Page 62 Example: A JK flip-flop from a D flip-flop The key to the construction is to set it up as follows:

A Combinational circuit that uses both external and current state values to determine the controls that produce the spec’d next state

J K

Q __ Q

ck Type of flip-flop being created

Type of flip-flop being used

This guides the table to construct as follows: J

K

Q

Qnext

D

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 0 0 1 1 1 0

0 1 0 0 1 1 1 0

J D controls producing spec’d next state

KQ

00

01 0

0

10 3

2

5

7

6

1 4

1

11 1

1

1

1 __ __ D = JQ + K Q

JK spec so our diagram becomes

J

Q

K

__ Q

ck

Page 63 Example: Make up your own flip-flop and construct it from JK flip-flops Specify the characteristic table and the JK excitations that will produce the same next state behavior. U

N

F

Q

Qnext

J

K

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 1 -

0 1 1 1 1 1 1 1

UN

FQ

00

01

00

0

01

0

11

0

10

0

11 1

0

-

13

-

00

00

-

01

-

11

-

10

-

0 5

J = UNF

7

13

15

J

Q

K

__ Q

N F ck

14

11

1

K = U+N+F

U

6

-

1 9

1

2

-

1

1 8

10

10 3

1

1 12

0

11 1

4

1 11

01 0

14

15

-

FQ

6

0

9

8

2

7

-

-

UN

0

-

12

10 3

5

4

characteristic equation of the flip-flop: __ __ __ __ Qnext = U N F Q + U N F Q

10

-

Page 64 Example: Race Condition S D C D R

clock Assume that leading edge-triggered D flip-flops are being used (say of the type described earlier). Then for 0 to 1 transition on the latch enabled by the control line C, any of (0,1), (1,1), (1,0) may be latched depending on when the clock rises. Note that even if the control line is controlled by the clock, it could rise ∆t ahead of the clock signal at the D flip-flops, the point at which the latch outputs are (1,1) when an active transition is in progress. Registers: A row of associated flip-flops in series or in parallel is called a register. The combinations are: • serial in, serial out (slow devices) • serial in, parallel out (slow in, fast out) • parallel in, serial out (fast in, slow out) • parallel in, parallel out (fast in, fast out) A shift register uses serial in, serial out. input

D

D

D

D

output

clock Every clock pulse the flip-flop values shift one to the right. The left-most flip-flop obtains its new value from the input line and the value of the right-most flip-flop is the output at each clock pulse. It should be noted that this requires all leading edge or all trailing edge flip-flops to work properly. If the output is fed back to the input, the shift is called a circular shift. Three-state logic is needed to construct a shift register that can shift in either direction.

Page 65 shift right input left

D

D

D

output right

D

input right output left

shift left In contrast, parallel input has the appearance i0 i2 i1 i3 D

D

D

D

ck Counters: Counters are often needed to control tasks such as count by 8 to shift in 8 bits (1 byte) serially. T flip-flops provide a natural means for constructing a mod 2n ripple counter (counts cyclically 0 to 2n-1). It can be initialized to 0 via the “clear input” provided on most flip-flops. Q1 Q0 Q2 J J J ck K K K enable If trailing edge flip-flops are used, then when enabled, the counter operates according to Q0 changing with the clock falling, Q1 with Q0 falling, and Q2 with Q1 falling as given by: count clock 0 1 2 3 4 5 6 7

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Q2

Q1

Q0

0

0

0

0 0 0

0 1 1

1 0 1

1 1 1

0 0 1

0 1 0

1

1

1

Q0 falls

Page 66 Sequential circuit design: Sequential circuits make transitions from state to state in response to inputs. Sequential circuits are physical realizations of a kind of theoretical machine called a finite state automaton (FSA). An FSA can be described by use of a graphical representation called a state diagram. An FSA is given by specifying: 1. An input alphabet I 2. An output alphabet O (possibly NULL) 3. A finite set of states S 4. A start state, s0 ∈ S 5. A transition function f:S × I → S (this is the next state function, where f(current-state, current-input) = next-state) 6. Moore circuit – output is on the state (may be NULL) an output function g:S → O is given 7. Mealy circuit – output is on the transition (may be NULL) an output function h:S × I → O is given Examples: 1. Serial parity checker – input is data (having a parity bit) and output is the current parity bit (odd parity) • Input alphabet is {0,1} • Output alphabet is {0,1) • States are {S0, S1} • S0 is the start state • The transition function is given by the state diagram (Moore circuit) 0

1

0 S1/0

S0/1

S × S0 S0 S1 S1

I 0 1 0 1

S output S0 1 S1 0 S1 0 S0 1

1 The parity bit is an added data bit used to check for occurrence of an error in data. It is commonly employed with memory circuits, where any error indicates a serious problem (usually a failed memory chip). The parity bit is usually appended to the data bits. For odd parity, the added parity bit is selected so that the total number of 1’s is odd. For even parity, it is selected so that the total number of 1’s is even. For example, if the data is 0 1 0 1 0 0 1 0 and odd parity is being used, then the parity bit data including the parity bit is 0 1 0 1 0 0 1 0 0 For the parity-checking FSA, data is input serially and the current state outputs the bit needed for odd parity. Note the boundary condition – when no data has been input (empty input), the parity bit is 1. If the 9-bit example above is sent through the parity checker and the output of the final state does not agree with the parity bit, a parity error has occurred.

Page 67 2. Sequential binary adder – input is pairs of binary digits and output is their sum; carry-in, carry-out information is tracked by the current state. • • •



Input alphabet – {00,01,10,11} Output alphabet – {0,1} States are {S0, S1, S2, S3} as follows: S0 outputs 0, no carry S1 outputs 0, carry S2 outputs 1, no carry S3 outputs 1, carry Transitions are given by the state diagram 00 01,10 S0/0

01,10

11

11

00

00 01,10

01,10

S3/1

S1/0

States with no carry

S2/1

00

11

States with carry

11 Trace: 0 1 0 0 1 +0 1 0 1 1 The input pairs are (1,1),(0,1),(0,0),(1,1),(0,0). For [current-state, current input] the transitions are [S0, 11] → S1 output 0 to carry state [S1, 01] → S1 output 0 remain in carry state to no carry state [S1, 00] → S2 output 1 [S2, 11] → S1 output 0 to carry state [S1, 00] → S2 output 1 to no carry state (final) so the result is 1 0 1 0 0 as expected. Generally, the structure of the FSA can be determined from the state diagram, so usually only the state diagram is specified in the design process. The next step is to detail how the FSA is converted to a circuit.

Page 68 The sequential circuit design process is conducted as follows: 1. Problem statement 2. State diagram 3. Elimination of inaccessible states (if any) – these are states that cannot be reached from the Start State 4. Assignment of states to flip-flop combinations: # of states # of ff’s needed 1 or 2 1 3 or 4 2 5,6,7 or 8 3 . . . and so forth 5. Transition/output table – control values producing the needed next state behavior are determined from flip-flop excitation tables current states

inputs

next states

controls

outputs

6. K-map analysis to produce • control equations • output equations 7. Circuit diagram Example: Parity checker using JK flip-flops. Steps 1 and 2 were done earlier. There are no inaccessible states. Step 4: Assignment of states to flip-flop combinations. Since there are only 2 states, 1 flip-flop (Q0) can represent both. 0 0 1 State Q0 S0 0 S1 1 S1/0 S0/1 1 Step 5: Transitiion table 1 Q0next

Q0 I 0 0 1 1

0 1 0 1

0 1 1 0

J K

Z

0 1 -

1 1 0 0

0 1

Recall: JK flip-flop excitation table

Step 6: K-map analysis for J,K and Z I I 0 1 0 Q0 Q0 0

0 -

1

- 0

0

2

1 - 1

1

0

1

1

0

3

J=I, K=I

0

2

1 1 0

__ Z= Q 0

1

3

Q Qnext

J

K

0 0 1 1

0 1 -

1 0

0 1 0 1

Page 69 Step 7: Circuit for parity checker J

__ Q0

I

K

Z

clock

Example: Binary adder using JK flip-flops Steps 1 and 2 were done earlier. Step 3: There are no inaccessible states. Step 4: Assignment of states to flip-flop combinations. Since there are 4 states, 2 flip-flops (Q0,Q1) will be needed. State

Q0

Q1

S0 S1 S2 S3

0 0 1 1

0 1 0 1

Step 5: Transition/output table

S0

S1

S2

S3

Q0

Q1

I0

I1

Q0n

Q1n

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1

0 0 0 1 0 1 1 1 0 0 0 1 0 1 1 1

S0 S2 S2 S1 S2 S1 S1 S3 S0 S2 S2 S1 S2 S1 S1 S3

J0

K0

J1

K1

Z

0 1 1 0 1 0 0 1 -

1 0 0 1 0 1 1 0

0 0 0 1 0 0 0 1 -

1 0 0 0 1 0 0 0

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

Recall: JK flip-flop excitation table Q Qnext

J

K

0 0 1 1

0 1 -

1 0

0 1 0 1

00 01,10 S0/0

01,10 S2/1

00 11

11

00

00 01,10

01,10

S3/1

S1/0 11

11

Page 70 J0, K0, J1, K1 can be resolved via K-maps. observe an XOR pattern. I0I1

I0I1 00

Q0Q1

Note that J0 and K0

01

00

0 -

01

1 -

11

- 0

10

- 1

11

10

1

0

1 0 13

12

- 1

15

- 0 9

8

- 0

11

- 1

01

00

0 -

01

- 1

11

- 1

10

0 -

0 -

14

10

13

0 -

J0 K0 ___ J0 = Q1(I0 ⊕ I1) + Q 1 (I0 ⊕ I1) = Q1 ⊕ I0 ⊕ I1 ___ ___ K0 = Q 1 (I0 ⊕ I1) + Q1(I0 ⊕ I1) = Q 1 ⊕ I0 ⊕ I1

- 0 14

15

- 0

- 0 9

8

- 0

6

7

- 0

- 0

2

0 -

1 -

- 0

12

- 1

10 3

5

4

0 -

11 1

0

6

7

1 -

00

Q0Q1

1 -

0 5

4

2

3

10

11

0 -

1 -

J1 K1

J1 = I0 I1 ___ ___ K1 = I 0 I 1 = I0 ↓ I1 By observation, Z = Q0 The circuit construction is then given by: Z J0 Q0

J1 Q1

__ K0 Q 0

__ K1 Q 1

I0 I1

Counter design: Counters can have particularly simply design. counter has the state diagram: 0000

0001

0010

0011

0100

1001

1000

0111

0110

0101

For example, a BCD

Page 71 Transitions are made with the clock. External inputs are not required. States are named using flip-flop values. The transition/output table is then Q3 Q2 Q1 Q0

Q3n Q2n Q1n Q0n

J3 K3 J2 K2 J1 K1 J0 K0

0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 the rest are

0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 0 0 0 don’t

0 0 0 0 0 0 0 1 -

0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 0 0 cares

0 1

0 0 0 1 0 0

0 0 0 1 -

Q1Q0

0 1 0 1 -

1 1 1 1 1 -

01

00

0 -

01

0 -

11

- -

10

- 0

11

0 -

13

0 -

9

- 1

- -

00

0 -

01

- 0

11

- -

- -

10

0 -

11

0 5

J3,K3 J3 = Q2Q1Q0 K3 = Q0

- 0 14

15

- -

- 9

0 -

6

7

13

8

0 -

- 1

- -

2

3

1 -

- 0

12

10

1

4

10

11

- -

01 0

14

15

- -

00

Q3Q2

6

7

1 -

- 8

0 -

0 -

0 -

2

3

5

4

12

10

1

0

10

11

- -

- -

J2,K2 J2 = Q1Q0 K2 = Q1Q0

Q1Q0 Q3Q2

1 1 1 1 1

Q1Q0 00

Q3Q2

0 1 0 1 0 0

Q1Q0 00

01 0

00

0 -

01

0 -

11

- -

10

0 -

11

1 4

5

15

- 9

0 -

11

- -

J1,K __1

J1 = Q 3Q0 K1 = Q0

Q3Q2

00

01 0

00

1 -

01

1 -

11

- -

10

1 -

6

7

13

8

- 0

- 1

- -

2

3

- 1

1 -

12

10

1

- 0

10

5

15

- 9

- 1

6

7

13

8

1 -

- 1

- -

2

3

- 1

- 1

12

10

- -

- 1 4

14

- -

11 1

11

- -

1 14

- 10

- -

J0,K0 J0 = 1 K0 = 1

The counter operates synchronously with the clock. Note that Q0 is common to each of J3,K3,J2,K2,J1,K1. Hence if we assign CK3=Q0, J3=Q2Q1,

Page 72 and K3=1, we have the same effect as the original assignment when __the clock is high. Likewise assign CK2=Q0, J2=Q1, K2=Q1 and CK1=Q0, J1= Q 3, K1=1. The counter now operates asynchronously with the clock attached to CK0. Observe that the Q0 flip-flop is operating as a T flip-flop (not a surprise since the 1’s position of the counter toggles with each increment). Moore and Mealy circuits: For a Moore circuit, the outputs are strictly a function of the states. For a Mealy circuit, the outputs are a function of the inputs as well as the states. For example, input clock

output

Moore circuit

input clock

output

Mealy circuit

Circuit Analysis: reverse the design process 1. 2. 3. 4.

Produce control and output equations from the circuit Generate the transition/output table from the equations Determine the next state columns in the transition/output table raw the state diagram

Example: starting from the following circuit diagram, assume that the start state is (Q0,Q1) = (0,0)

I0

J0 Q0

J1 Q1 Z

I1

__ K0 Q 0

Circuit equations: J0 = I0, K0 = I0⊕I1

__ __ J1 = I0+Q0+I1, K1 = I0 Q 0 + I 1 __ Z = I1Q0 Q 1

__ K1 Q 1

Page 73

Transition/output table:

S0

S1

S2

S3

Q0

Q1

I0

I1

Q0n

Q1n

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1

1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1

S1 S0 S2 S2 S0 S0 S1 S1 S2 S1 S0 S3 S2 S1 S0 S3

J0

K0

J1

K1

Z

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0

1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

1 0 1 1 1 0 1 1 1 0 1 0 1 0 1 0

0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0

State diagram: (Mealy circuit) 01/0

00/0

10/0,11/0 S0 00/0, 01/0

00/0

S2

10/0 01/1

11/1

10/0, 11/0 S1

01/0

00/0 S3

11/0

10/0 Remark: the semantic for the circuit can only be inferred from the state diagram; also, don’t care conditions used in the original design are unknown since they are accounted for in the circuit. Example: Given the control and output equations __ J1 = Y Q0 + Q1 Z=Q1 J0 = X ⊕ Y __ __ K1 = Q 0 K0 = X Q1 + Q0 the transition/output table is given by

Page 74 Transition/output table:

S0

S1

S2

S3

Q0

Q1

X

Y

Q0n

Q1n

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 1

S0 S2 S2 S0 S0 S2 S2 S0 S1 S0 S1 S0 S1 S1 S1 S1

J0

K0

J1

K1

Z

0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0

0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 1 0 1 0 1 1 1 1

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

State diagram: assume (0,0) is the start state (Moore circuit) 00,11 01,10 S0/0

S2/0

01,11

00,11

00,10

01,10 S3/1

S1/1

The isolated state is an artifact of the circuit implementation

00,01,10,11 Other Counters: The first counter considered was a mod 2n ripple counter, a natural counter formed by hooking T flip-flops up in series. It required no additional gate logic and was easily devised without resorting to sequential design techniques. In contrast, the BCD counter exemplifies designing a counter by working from a state diagram. In the BCD counter as given, no attention was paid to the 6 states present in the circuit but not used in the counting process. In particular, if the circuit initiated in one of these 6 states, its behavior would be unspecified. Hence, the user must initialize the flip-flops to 0 to assure that the counter gets to the BCD counting sequence. A self-starting counter is one which transitions to its counting sequence regardless of the state in which the circuit is initiated.

Page 75 A counter that employs n flip-flops is called an n-stage counter. Using a state diagram in designing a counter automatically minimizes the number of stages, but there are useful counters that employ more than the minimum. A shift-register counter counts by using a circular shift to move a bit pattern through the register. For example, to count 4, the pattern might be 1000, 0100, 0010, 0001. The register layout is initialize D

D

D

D

clock

There are reasons to use this kind of counter (e.g., to produce a sequence of “polling” signals, where each flip-flop enables the device being polled). There are 12 other bit patterns: 0111, 1011, 1101, 1110 and 0011, 1001, 1100, 0110 0101, 1010 0000 and 1111 These are grouped according to how they would count (the first group has two patterns that count 4, the second group has a pattern that counts 2, and the last groups has two patterns that count 1). It’s obvious that initialization is important if this kind of counter is to be employed. The counter can be constructed to force it to move to the desired counting sequence by adjusting the D0 input (currently Q3) for those cases that are not in the right sequence. Q0

Q1

Q2

Q3

D0

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

(force change (OK) (OK) (force change (OK) (force change (self-correct (force change (OK) (force change (self-correct (force change (self-correct (force change (self-correct (force change

from Q3) from Q3) from Q3) on later cycle) from Q3) from Q3) on later cycle) from Q3) on later cycle) from Q3) on later cycle) from Q3)

Page 76 __ __ __ From this it can be seen that D3 = Q 0 Q 1 Q 2 rather than D3 = Q3 will cause the counter to fall into the 1000, 0100, 0010, 0001 pattern within 3 clock cycles. The counter thus becomes self-starting. The initialization can be retained, but the above minor change enables the counter to return to its expected behavior in the event an anomalous event knocks the counter out of sequence at some point after initialization. To make a counter self-starting, any unused states simply need to be accounted for in the design. For example, for a counter counting 6 (which requires a minimum of 3 flip-flops), the state diagram Ex1

1

2 3

0 Ex2

5

4

accounts for the 2 extra states that will occur when a circuit is implemented using 3 flip-flops and ensures that the counter will be in its counting sequence within 1 clock cycle. Johnson counter: an n-stage count by 2n counter based on shiftregister counting. Johnson counters cycle the complement of the final flip-flop in the sequence to double the counting period. For example, a count by 8 Johnson counter has the form: initialize D

D

D

D

clock With a counting sequence 1000,1100,1110,1111,0111,0011,0001,0000 Note that 8 states are unused, so either the counter has to be forced to its counting sequence, or it has to be initialized.

Page 77 Barrel Shifter: Recall that a shift register shifts 1 bit at a time. If the shift amount is more than 1, then the process has to be repeated until the specified amount of shifting has been accomplished. A barrel shifter uses multiplexers to determine the shift so that it can be accomplished in one cycle. For a 4-bit register, a barrel shifter accomplishing a circular shift right of 0, 1, 2, or 3 (specified via (s0,s1)) is structured as follows:

D

D

D

D

0 1 2 3

0 1 2 3

0 1 2 3

0 1 2 3

4 to 1 MUX

4 to 1 MUX

4 to 1 MUX

4 to 1 MUX

s0

s1 Note that each flip-flop is controlled by a multiplexer, which is used to select the input sent to the flip-flop. The multiplexer's function is to route the value selected according to its address lines to the flip-flop's input. To set up the circuit as a shift register, the 4 multiplexer input data lines are simply hooked up to the flip-flop outputs so that each address matches a shift value, with address 0 matching a shift of 0, address 1 matching a shift of 1, and so forth. Thus, the amount of the shift is entered via the address lines (S0,S1). The circuit can be reconfigured for different shift patterns by simply hooking up the multiplexer input data lines to the flip-flop outputs (or other data values) in different ways.

Page 78 Glitches and hazards Physically, there is a time lag in a combinational circuit from the point in time that input signals are applied until their effect propagates through the various components of the circuit and the outputs react to the inputs. This is called the propagational delay. A manufacturer may include the expected propagational delay as a part of circuit specifications. Propagational delay is a physical reality with consequences that may affect circuit behavior, particularly that of a sequential circuit. To illustrate point, consider the circuit given by __ this __ f = AB + A C Assume that f is implemented by

A

)t 1 )t 3

B C

f

)t 2

where ∆t1, ∆t2, ∆t3 give the propagational delay associated with the delineated components. Assume also that ∆t1 > ∆t2

__ __ __ For purposes of illustration, suppose that the inputs A, A ,B, B ,C, C are changing (synchronously) according to the timing pattern Logic 1 A

Logic 0 Logic 1

B

Logic 0 Logic 1

C

Logic 0

Page 79 If we extend the timing diagram to track the circuit components as they react to the inputs using a similar timing diagram, we obtain the following:

A B delay ∆t1

delay ∆t2

C __ AB __ AC __ __ AB+AC delay ∆t3

glitch

delays ∆t1 and ∆t2 coupled with the changing values of A, B, C produce a signal variance in the expected value of f that would not happen in the absence of propagational delay. This variance, called a circuit glitch, appears in the form of a brief pulse, which could possibly trigger a state change elsewhere in the circuit. The component organization which causes it is called a hazard. An examination of the K-map for f is instructive in determining the source of the hazard. BC 00 01 11 10 A 1

0

0

1 5

4

1

1

3

2

7

6

1

1

As can easily be determined, the formulation for f we started with is in fact a minimal sum-of-products expression, using two of the three prime implicants of f. __ __ These two prime implicants (A B and A C) cover the third prime __ implicant, B C, which is usually considered unnecessary, since logically __ __ __ __ __ f= A B + A C = A B + A C + B C

Page 80 Assuming some appropriate propagational delay for BC (∆t4), consider what happens to the timing diagram when using the formulation __ __ __ __ __ f= A B + A C = A B + A C + B C

A B C __ AB __ AC __ __ AB+AC __ BC __ __ __ A B + A C + BC It is evident that adding the logically redundant term back into the expression has eliminated the glitch! There points to consider. The assumption that inputs __ are __some subtle __ A, A , B, B , C, C change synchronously is critical. For example, consider the following (admittedly nonsensical) construction using separate NOT and AND gates (with propagational delays as indicated):

__ A⋅ A

A ∆t1

∆t2

Page 81 The timing diagram for this circuit is as follows:

A __ A

delay ∆t1

delay ∆t2 __ A⋅ A Even for a simple construction such as this (or a similarly constructed prime implicant) a glitch is experienced. In our first example, such problems were avoided by synchronizing all inputs (including complements) to the prime implicants. In practice, glitches are not a great concern for combinational circuits (especially since the outputs are typically used to drive devices slow to react, such as lights). Although it is possible that a glitch's duration may be too short for a component such as a flip-flop to react, their presence is an obvious cause for concern in sequential circuits, which may change state unexpectedly (and hence perform incorrectly) on a misplaced signal pulse. In general, when inputs to prime implicants (or implicates) are synchronized in a combinational circuit, circuit hazards occur where two prime implicants (or implicates) that are non-overlapping have adjacent cells. Adding back the logically redundant (non-essential) prime implicants (or implicates) serves to eliminate the hazards causing such glitches. It should be noted that under this scenario, there may be a dramatic difference in the choice between using the sum-of-products form or the product-of-sums form. For example, consider the K-map BC 00 01 11 10 A 1

0

0

0

1

1

1 5

4

1

2

3

1

1

6

7

0

1

There are 6 prime implicants and 2 prime implicates, yielding the following two hazard free formulations: __ _ _ __ __ __ __ A B + B C + A C__ + A__B + __B C + A C (A + B + C)( A + B + C ) It is evident that the product-of-sums expression is simpler. This occurs because the removal of circuit hazards from the sum-of-products

Page 82 form requires adding in prime implicants that are logically nonessential or redundant. There are other alternatives. If a particular input combination triggering a glitch does not occur in actual implementation, then the associated hazard does not need to be addressed. Another strategy is to employ a flip-flop (perhaps a D flip-flop on the trailing edge of the input synchronization) to latch the value of f at a point after all glitches have occurred. Under this scenario, the circuit performance is slowed until the flip-flop outputs are set. A third alternative is to use an added synchronizing signal to hold the output at a (known) fixed value until the danger of glitches is past. In general this strategy takes the form:

I n p

u t s

. . .

Glitch-prone combinational circuit

. . .

S y n c h r o n o u s

O u t p u t s

synch signal In this case, there is the added complication of having to provide careful timing for the added "synch" signal. By setting the synch signal to 0 at the beginning of each cycle of input synchronization, all outputs of the glitch prone circuit can be held at 0 through the setup period when glitches are likely to occur, regardless of the presence of hazards. When the chance for glitches is past, the synchronizing signal is then changed to allow each output to cleanly switch to its logical value for the current set of inputs. This strategy does not have the longer implicit delay that is present in our second alternative, but does require close coordination with the system signal that is being used to synchronize the circuit inputs. At this point it should be noted that every strategy for dealing with glitches (even the one of removing hazards) has an element of synchronization associated with it. This is the primary reason that asynchronous sequential circuits have limited utility.

Page 83 Constructing memory: Generally a memory block is organized to have • address lines to determine which bits in the block to access • bidirectional data lines to send data to an addressed location in memory (write operation) or retrieve data from the addressed location (read operaton) • a R/W line to specify a read operation or a write operation • an enable line to activate the memory block for read or write access A single bit is a 1×1 block of memory and can be represented by a flipflop (no address line is needed).

Bi-directional data line

D

enable (CS)

R/W

(R=1,W=0)

A 2×1 block of memory can now be constructed from two 1×1 cells. of 2 decoder is needed to address the 1×1 cell wanted:

addr

1×1

A 1

d0

R/W CS

CS

1×1 R/W

R/W CS

A 4×1 block can be constructed from two 2×1 blocks using a 1 of 2 decoder, or from four 1×1 blocks using a 1 of 4 decoder. These two equivalent constructions appear as:

Page 84 (addr) a0 a1

d0

2×1 1 of 2 CS

(addr) a0 a1

R/W CS

1 of 4 CS

1×1 R/W CS

CS

CS

1×1

2×1 R/W

d0

R/W

R/W

CS

CS

1×1 R/W CS

1×1 R/W

R/W CS

Note that for the construction using two 2×1 cells, a1 selects a 2×1 cell and a0 selects a bit from within the cell. In effect, when larger memory blocks are constructed from smaller memory blocks, the higher order bits of the address are used to select one of the smaller blocks and the lower order bits are used to select the data item from within the selected smaller block. The memory modules in the 4×1 block can be arranged to construct a 2×2 block with 2 data lines, instead of 1: a0 CS

2×1

2×1

R/W

R/W

CS

CS

d0 d1

R/W A memory chip has a fixed capacity in bits, which can be organized either in favor of addressibility (4×1 requires more address lines than 2×2) or in favor of data groups (2×2 provides 2 bits per data group vs. 1 bit for 4×1).

Page 85 Note that in general, accessing a location in memory requires a large decoder. In practice, a 1 of 16 decoder requires a 24 pin package (4 address lines, 16 data lines, Vcc, GND, and CS), which indicates building a large decoder as a single chip is impractical. However, it is easy to build larger decoders from smaller ones; for example, a 1 of 64 decoder can be constructed from five 1 of 16 decoders as follows: a5 a4 a3 a2 a1 a0

1 of 16 CS

1 of 16 CS

. . .

CS (unused)

1 of 16 CS

. . .

. . .

64 output lines 1 of 16 CS

1 of 16 CS

. . .

. . .

This structure can obviously be extended to provide a decoder for any address requirement (albeit by using a lot of chips; for this reason, address decoding is normally a built-in feature of a memory chip). Hence, arbitrarily large memory blocks can be constructed. Memory is generally classified as • RAM memory - Random Access Memory (so-called since any randomly generated address can be accessed directly, which contrasts to a serial memory such as a magnetic tape). RAM memory can also be both read from and written to. • ROM memory – Read Only Memory (non-volatile memory with a fixed content that can be read from, but not written to). There are multiple varieties or ROM, some of which can be rewritten and some not. For example, EPROM (electrically programmable ROM) is ROM which can be erased (by ultra-violet exposure) and is written by special circuitry operating at a higher voltage; PLA’s

Page 86 (programmable logic arrays) start as a rectangular array of fuseable links which when selectively blown to create (permanent) bit patterns that then form a ROM; FPGA’s (field programmable gate arrays) are another variation, and can be rewritten with special circuitry; CD-ROM’s are yet another and may be either rewriteable or not. Memory is organized in a 2i×2j array of bits with i address lines and 2j data lines. The number of data lines is called the word size of the memory. If j=3, then the word size is 8. Since 8 bits is a byte, the memory would then have a capacity of 2i. If j=5, then the word size is 32, or 4 bytes. A 256×8 memory has 8 address lines (since 28 = 256) and 8 data lines. For RAM memory, data lines are bi-directional and the memory includes both R/W and enable control lines. The overall memory configuration has the appearance: address lines

bi-directional data lines

R/W enable

As already noted, larger memory units can be constructed from smaller ones by arranging the blocks in a grid, tying all R/W lines together, and using a decoder to select rows. Example: Construction of a 256 byte memory with word size of 4 bytes using 16 byte memory modules. The specification calls for 256/4 = 64 words. Each word has 4 bytes, so there are 32 data lines. To get 64 words using 16 byte modules, there needs to be 64/16 = 4 rows, each having 4 modules. 256 bytes requires 6 address lines. Hence, the memory should appear as a 4×4 grid with 6 address lines and 32 bi-directional data lines.

Page 87 address a5 a4 a3 a2 a1 a0

1 of 4 CS CS

16 bytes

16 bytes

16 bytes

16 bytes

CS

CS

CS

CS

16 bytes

16 bytes

16 bytes

16 bytes

CS

CS

CS

CS

16 bytes

16 bytes

16 bytes

16 bytes

CS

CS

CS

CS

16 bytes

16 bytes

16 bytes

16 bytes

CS

CS

CS

CS

d0d1...d7

d8d9...d15

d16d17...d23

d24d25...d31

Note that the high order bits of the address are tied to the 1 of 4 decoder and the 4 lower order bits address the memory modules across each row. The decoder activates a row and the lower order bits select a word within that row. The R/W lines are omitted because they are all tied together. The addressing requirement can be reduced by using memory blocks that require 2 select (enable) inputs (S1,S2). S2 data

S1 R/W

Arranging these in a rectangular grid effectively halves the decoder requirement; eg., a 256=28 byte memory module requires a 1 of 256

Page 88 decoder. If blocks using 2 select inputs are employed, and the memory is arranged in a 16×16 grid, with a 1 of 16 decoder accessing s0 lines and another 1 of 16 decoder accessing s1 lines, then all blocks are accessed and only 32 decoder lines have been used instead of 256! Building decoding into a memory module obviously reduces the need for large external decoding circuits. Memory sizes are given by employing standardized prefixes as follows: International Unit (base 10) Prefixes, 1993 1024 ..... 1021 ..... 1018 ..... 1015 ..... 1012 ..... 109 ..... 106 ..... 103 ...... 102 ...... 101 ...... 10-1 ..... 10-2 ..... 10-3 ..... 10-6 ..... 10-9 ..... 10-12 .... 10-15 .... 10-18 .... 10-21 .... 10-24 ....

yotta zetta exa peta tera giga mega kilo hecta deca deci centi milli micro nano pico femto atto zepto yocto

These are used directly with base 10 measures; e.g., picosecond (1 trillionth of a second = 10-12) millimeter (1 thousandth of a meter = 10-3) megaflop (1 million floating point operations per second = 106) They are also used with measures based on 1K = 210 = 1024 ≈ 1000 = 103 ; eg., gigabyte (≈ 1 billion bytes) Sequential circuit clock speed is measured in Hertz where 1 Hertz ≡ 1 Hz ≡ 1 cycle per second. This is a measure CPU manufacturers often cite with respect to processor speed (e.g., a 2.5GHz processor has speed measured in GigaHertz). Example: 100 MegaHertz = 100 MHz = 100×106 Hz = 108 cycles per second. 108 cycles per second is 10/109 seconds per cyle or 10 nanoseconds per cycle. Memory generally operates at slower speeds than the processor, which means it is accessed asynchronously (on a different clock timing). A delay of 3 nanoseconds is 3/109 seconds. If signals need to occur at no more than 1/3rd this rate, then clock pulses are limited to 9/109 implying a clock speed of no more than 109/9, no faster than 110 MHz.

Page 89 Implementing Circuits Using ROMs: We have already observed that combinational circuits can be implemented by discrete logic gates or by using higher order circuits such as decoders and multiplexers. They can also be implemented by using ROMs. Combinational circuits: The information in the truth table specification for a combinational circuit can be viewed as specifying the contents for a ROM implementation of the circuit; e.g., the circuit specification for the function f below can be implemented by an 8×1 ROM whose contents are the given by the specification:

Specification for f

X

Y

Z

f

0

0

0

1

0

0

1

0

0

1

0

0

0

1

1

0

1

0

0

1

1

0

1

1

1

1

0

1

1

1

Address Contents

000:

1

001:

0

010:

0

011:

0

100:

1

101:

1

0

110:

0

1

111:

1

Inputs = Address

X = A2 Y = A1 Z = A0

Data = Output f

8×1 ROM For contrast, recall the alternative approaches for the same specification as illustrated below: K-map analysis and logic gate implementation: X

YZ 0 1

00

01

11

10

0

1

3

2

X

4

5

7

6

Y

1 1

1

1

__ __ ______ f = Y Z +XZ = ( Y + Z )+XZ

Z

f

Page 90 Multiplexer implementation:

0 1 2 3

X

Y

Z

f

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

1 0 0 0 1 1 0 1

__ Z 0 1 Z

__ Z 0

0 1 2 3 4×1 MUX

1 Z

X Y

f

2 1

Decoder implementation: X

Y

Z

f

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

1 0 0 0 1 1 0 1

(0)

(4) (5) (7)

X Y Z

4 2 1

0 1 2 3 4 5 6 7

f

1 of 8 decoder

The K-map logic gate approach requires the most analysis but uses the simplest components. The contrasting ROM implementation requires the least analysis, but this advantage is offset by having to “burn” the desired contents into each ROM memory cell.

Sequential Circuits: In a similar fashion, in the transition/output table for a sequential circuit, the current state and input columns can be viewed as providing ROM addresses that point to memory locations where the next state information is stored. Circuit outputs can likewise be stored at ROM addresses pointed to by the current state and input columns. To illustrate this, consider the following state diagram for a sequential circuit:

Page 91

C/1

A,D/1

A,B,C,D/0

D/0

C/0 C/0 S/ 0000

T1/ 0001 A/1 B/0

D/0

T2/ 0011

A/0

A/0

T3/ 0111

T4/ 1111

B/0

B/0

B,C,D/0

H/ 0000 A,B,C,D/0

Suppose that the states and inputs are encoded as follows and that state outputs are given by variables O1, O2, O3, O4 as indicated: States S T1 T2 T3 T4 H

Inputs Q2

Q1

Q0

0 0 0 0 1 1

0 0 1 1 0 1

0 1 0 1 0 1

A B C D

States X

Y

0 0 1 1

0 1 0 1

S T1 T2 T3 T4 H

State Output O1

O2

O3

O4

0 0 0 0 1 0

0 0 0 1 1 0

0 0 1 1 1 0

0 1 1 1 1 0

The transition/output table corresponding to the state diagram and based on this encoding is as follows:

Page 92 Q2n Q1n

S

T1

T2

T3

T4

H





Q0n ║

Q2

Q1

Q0

X

Y

M2

M1

M0

Z

0

0

0

0

0

0

0

0

1

0

0

0

0

1

1

1

1

0

0

0

0

1

0

0

0

1

0

0

0

0

1

1

0

0

0

1

Q0=A2

0

0

1

0

0

0

0

0

1

X=A1

0

0

1

0

1

1

1

1

0

Current State

D2

D1

0

0

1

1

0

0

0

1

0

0

0

1

1

1

0

1

0

0

0

1

0

0

0

0

1

1

0

0

1

0

0

1

1

1

1

0

0

1

0

1

0

0

0

0

1

0

1

0

1

1

0

1

0

0

0

1

1

0

0

1

0

0

0

0

1

1

0

1

1

1

1

0

X

0

1

1

1

0

1

1

1

0

Y

0

1

1

1

1

1

1

1

0

1

0

0

0

0

1

0

0

0

1

0

0

0

1

1

0

0

0

1

0

0

1

0

1

0

0

0

1

0

0

1

1

1

0

0

0

-

-

-

-

-

-

-

-

-

1

1

1

0

0

1

1

1

0

1

1

1

0

1

1

1

1

0

1

1

1

1

0

1

1

1

0

1

1

1

1

1

1

1

1

0

D0

Q2

Q1

Q2=A4 Q1=A3

Y=A0

Q0

Address Contents

00000: 00001: 00010: 00011: 00100: 00101: 00110: 00111: 01000: 01001: 01010: 01011: 01100: 01101: 01110: 01111: 10000: 10001: 10010: 10011: 10100: 10101: 10110: 10111: 11000: 11001: 11010: 11011: 11100: 11101: 11110: 11111:

0001 1110 0010 0001 0001 1110 0010 0100 0110 1110 0001 0100 1000 1110 1110 1110 1000 1000 1000 1000

M2 M1 M0 Z

1110 1110 1110 1110

32 × 4 ROM Address Contents

Q2=B2 Q1=B1 Q0=B0

000: 001: 010: 011: 100: 101: 110: 111:

0000 0001 0011 0111 1111 0000

8 × 4 ROM Z is the output on transitions. Note that the current state information is maintained in the 3 D flip-flops given by Q2, Q1, Q0. The next state is given by the memory data lines labeled M2, M1, M0. The M2, M1, M0 output values are applied to the inputs of the D flip-flops, ready to be latched on the transition to the next state. Since the output associated with each state does not rely on the transition inputs X, Y, a smaller memory unit is sufficient for representing this requirement of the circuit specification. The implementation is almost a direct transliteration of the truth table specification for the circuit, which requires considerably

O1 O2 O3 O4

Page 93 less analysis than implementing the circuit using gate logic. The downside again is the need to program ROMS for the circuit specification. If FPGA’s or similar ROMs are available along with the means to program them, then this approach becomes a good choice for implementation, especially in light of the fact that it requires relatively few connections. Hamming code: Adding a parity bit to a sequence of data bits provides an encoding of the data that enables detection of the presence or absence of error in the data, so long as at most 1 bit is at fault. If there is an erroneous bit, the approach does not identify it, however. The idea of parity bits can be easily extended to provide means for not only detecting the presence of an erroneous bit, but also the means for locating and correcting it. This kind of encoding is called an error correcting code. There are error correcting code techniques that will detect and correct multiple bit errors. Hamming code provides an introduction to the idea behind these coding techniques. We will only consider Hamming’s single error detection/correction code. First view data as occurring at positions 1,2,3, ... To show the concept, we first limit ourselves to 15 data positions. Consider the position numbers listed in binary and observe the column patterns of 1’s: dcba 1 0001 a: 1’s at 1,3,5,7,9,11,13,15 2 0010 3 0011 b: 1’s at 2,3,6,7,10,11,14,15 4 0100 5 0101 c: 1’s at 4,5,6,7,12,13,14,15 6 0110 7 0111 d: 1’s at 8,9,10,11,12,13,14,15 8 1000 9 1001 10 1010 11 1011 12 1100 13 1101 14 1110 15 1111 Any position is identified by the columns it has 1’s in (ie., 13 occurs only in columns a,c,d and 2 only occurs in column b). To elaborate, if there is a bit error in position 13, then • a parity check of the positions given by d will identify the problem bit as being in one of 8,9,...,15. • A parity check of the positions given by c reduces this list to one of 12,13,14,15. • A parity check of b doesn’t include 13 and so doesn’t find any error, which eliminates 14,15 and reduces the list to one of 12,13.

Page 94 •

A parity check of the positions given by a identifies 13 as the culprit.

To summarize, if parity checks are conducted on the bit positions identified by 1’s in each of columns a,b,c,d then an error in a bit position will result in an parity error for 1 or more of these checks. The combination of the parity errors precisely locates the bit position causing the error. There are 24-1 data positions, and we need 4 parity bits, so that leaves up to 11 bits available for user data. With 5 parity checks, there would be 31-5=26 bits available for user data. If we assume the data is in bytes (ie., we have 8 user bits), then adding on 4 bits for parity checking results in a 12 bit encoding of the data. If the parity bits are simply appended to the user bits, then some difficulty will occur in setting them. This can be avoided if the parity bits are placed at the positions which occur in only 1 column (those with a single 1, position 1,2,4,8). If the user data is 0 1 1 0 1 0 1 1, then it is encoded as _ _ 0 _ 1 1 0 _ 1 0 1 1 1 2

4

8

where positions 1,2,4,8 receive the corresponding parity check. For even parity, this determination is as follows: _ _ 0 _ 1 1 0 _ 1 0 1 1 1 2

4

8

_ _ 0 _ 1 1 0 _ 1 0 1 1 1 2

4

4

4

parity at position 4 is 1

8

_ _ 0 _ 1 1 0 _ 1 0 1 1 1 2

parity at position 2 is 0

8

_ _ 0 _ 1 1 0 _ 1 0 1 1 1 2

parity at position 1 is 1

parity at position 8 is 1

8

The encoded user data is then 1 0 0 1 1 1 0 1 1 0 1 1 1 2

4

8

If the data is transmitted and the received data is 1 0 0 1 1 0 0 1 1 0 1 1 1 2 4 8 then the parity checks result in a) 1 0 0 1 1 0 0 1 1 0 1 1 - OK b) 1 0 0 1 1 0 0 1 1 0 1 1 - error c) 1 0 0 1 1 0 0 1 1 0 1 1 - error d) 1 0 0 1 1 0 0 1 1 0 1 1 - OK which identifies position 0110 =10 6 as the one in error.

Page 95 Note that setting the parity bit can be accomplished simply by using XOR; e.g., if the code word is notated by C[1 2 3 4 5 6 7 8 9 10 11 12], then the parity bits are obtained by C[1] ← C[3] ⊕ C[5] ⊕ C[7] ⊕ C[9] ⊕ C[11] C[2] ← C[3] ⊕ C[6] ⊕ C[7] ⊕ C[10] ⊕ C[11] C[4] ← C[5] ⊕ C[6] ⊕ C[7] ⊕ C[12] C[8] ← C[9] ⊕ C[10] ⊕ C[11] ⊕ C[12] If an overall parity check is included at position 0, then the Hamming code word extended by this bit becomes a single error correcting, double error detecting code. The following 4 cases cover all possibilities for 2 or fewer errors: 1. no parity error, no Hamming error ⇒ no error detected 2. no parity error, Hamming error ⇒ double error detected 3. parity error, no Hamming error ⇒ parity bit in error 4. parity error, Hamming error ⇒ correctable error detected This is easy to see: • If there are no errors, there are no parity errors for any of the checks and no error correction is needed. This is the “no parity error, no Hamming error” case. • If 2 bits are in error in the overall code word, then the overall parity will be unaffected; ie., the overall parity check will find no error. On the other hand, since at least one of the errant bits is in the Hamming code word, the Hamming parity checks will flag an error. This is the “no parity error, Hamming error” case, and flags occurrence of a double error. In this case error correction no longer applies, since there is no way to determine which 2 bits are in error, even if one of them happens to be the parity bit, but the double error has been detected. • If a single bit is in error then an overall parity error will be flagged. If the bit is the parity bit, then the Hamming code word generates no errors. This is the “parity error, no Hamming error” case, and the parity error can be corrected by changing the parity bit (so single error correction remains in effect). • If a single bit is in error and it is in the Hamming code word, then the Hamming parity checks locate the position of the bit. This is the “parity error, Hamming error” case, and the error can be corrected using the Hamming decoding technique. This covers all possibilities of 0, 1, or 2 errors being present. If more than two errors are present, one of these cases will occur, but the result will be erroneous.

Page 96 Computer Systems Level Representing numeric fractions: Earlier we examined data representation formats for integers, Boolean values, and characters. A full processing environment also needs to include a representation format for fractions. The systems circuitry that implements these kinds of data manipulations is called the arithmetic and logic unit (ALU). One of the things that has to be considered in designing a system is whether a feature should be implemented in hardware or software. For example, floating point numbers can be implemented either in circuitry or by software. If implemented in software, the specification for the representation format can be easily changed. If implemented in hardware, then it is advantageous to use a representation standard, since changes at the hardware level carry more severe penalties than changes at the software level. The term floating point numbers is used because the representation employed is based on “scientific notation” where the value is approximated by “floating” the decimal point until only one digit is to the left of the decimal point, marking the magnitude by keeping track of the power of 10 necessary to restore the decimal point’s location. Hence, the basic format has the form: × 10 for example, -3.456 × 10-23

or 5.12345 × 1015

An arithmetic operation for numbers in this format utilizes the arithmetic operations for integers, but requires special handling for exponents and normalization. Normalization is the process of manipulating a result by adjusting the exponent, floating the decimal point until there is only one digit to its left. Normalization example: -123.456 × 10-11 normalizes to -1.23456 × 10-9 [normalize by adding 2 to the exponent] (In this case the exponent has been decreased by 2 to float the decimal point two positions to the left). 0.0000012345 × 1015 normalizes to 1.2345 × 109 [normalize by subtracting 6 from the exponent] (In this case the exponent has been increased by 6 to float the decimal point six positions to the right). Multiplication and division are straight forward. Multiplication and division examples: (2.01 × 10-11) × (-9.3 × 1016) = -18.693 × 10-11+16 = -1.8693 × 106 [set the sign, multiply the mantissas, add the exponents, normalize and round] (2.01 × 10-11) ÷ (-9.3 × 1016) = -.216129_ × 10-11-16 = -2.1613 × 10-26 [set the sign, divide the mantissas, subtract the exponents, normalize and round]

Page 97 Addition and subtraction require exponent manipulation since the digits have to the lined up according to position. Addition/subtraction example: (2.345 × 109) + (9.31 × 1014) = (.00002345 × 1014) + (9.31 × 1014) = 9.31002345 × 1014 [adjust the number with the smaller magnitude to match the exponent of the one with larger magnitude, then add/subtract the mantissas, normalize and round] Another way to look at this is that addition and subtraction require moving the decimal point for the smaller number until the magnitudes of the two numbers match. In the computer context, base 10 is not the natural base to employ. In particular, on IBM mainframes (360 series), floating point numbers are hexadecimal based, using a 64-bit format developed by IBM for their systems. On these systems, IBM also employed its own character encoding format (EBCDIC). For obvious reasons, it is not advisable for a single manufacturer to dictate standard formats, so “neutral” groups, in which representatives from many manufacturers participate, develop and promulgate standards for general adoption by industry. Industry recognizes that lack of “standard” representation formats complicates the portability of data among systems. Systems that do not conform to standards eventually lose market appeal as more and more competing companies adopt recognized standards. As discussed earlier, the ASCII character encoding format has been widely adopted and integers are now almost always represented in 2’s complement, rather than the 1’s complement format. The most widely adopted floating point standard is the IEEE 754 standard. It employs the biased exponent concept used in IBM’s format, but in contrast employs a base 2 format rather than hexadecimal. Note that it is the binary point that floats, rather than the decimal point. In contrast to 2’s complement, there is no “natural” underlying finite algebra for floating point numbers. Hence, a sign-magnitude representation, with its implicit complications for managing arithmetic, is employed. For this reason, in early computational machines, floating point computations were almost always handled via software to hold down the size of the computational circuits. Floating point circuitry is now integrated into most processors and for almost all of them is compliant with the IEEE standard.

Page 98 IEEE 754 Floating Point Standard The IEEE 754 floating point standard provides a standard way of representing fractional quantities based on standard scientific notation (in base 2). The basic components for representing a number x are organized: (-1) × 2( ± exponent 1

- )

mantissa

8 23 32-bit single precision base 2 exponent biased by 127 (range -126 to 127) [true exponent is ( - 01111111)]

± exponent 1

± 1

× 1.

mantissa

11 52 64-bit double precision base 2 exponent biased by 1023 (range -1022 to 1023) [true exponent is ( - 01111111111)] exponent

mantissa

15 64 80-bit extended precision base 2 exponent biased by 16383 (range -16382 to 16383) [true exponent is ( - 011111111111111)]

An exponent of all 1's is used to show an exception: with a mantissa of 0 it represents ±∞, depending on the sign; otherwise the mantissa provides the designation for an illegal operation. For an exponent not all 0's (and not all 1's), the number is in normalized form, meaning the exponent and mantissa have been adjusted to produce a mantissa of the form 1.xxx ... xxx. In the representation, the leading 1 is an implied leading 1 (providing an extra bit of precision). This is the usual way numbers are represented in floating point. For an exponent all 0's, the number is too small to be normalized and so is represented unnormalized. 0 is given by a mantissa of 0 and the minimum exponent (all 0's). Example: 212.562510 = 11010100.10012 or 1.101010010012 * (27)10 The biased exponent is 127+7 = 13410 = 100001102 In IEEE 32 bit format:

0 10000110 10101001001000000000000

Page 99 Guard bits, rounding: Guard bits are extra bits maintained during intermediate steps to minimize loss of precision due to use of routine arithmetic operations and rounding. The implied 1 under the IEEE format limits precision loss under multiplication, since the result of multiplying mantissas will always be greater than or equal to 1. However, the simple multiplication of binary floating point values, 1.12 × 1.12 = 10.012, illustrates that a right shift may be needed to normalize the result (in this case to 1.0012) and that the number of significant bits may double. A right shift may result in loss of precision since a significant bit may get shifted off of the end. By carrying an extra bit during intermediate steps, this effect can be countered. If an extra bit is carried for rounding, then an additional guard bit is needed to prevent an intermediate right shift from shifting away the rounding bit. Rounding strategies: The rounding technique is important, because you don’t want loss of precision to cascade to a significant error when multiple calculations are being performed; hence, to be viable, the rounding strategy must balance, rounding up half the time and the other half rounding down. 1. truncation As a strategy, truncation is not viable since it always rounds down (up if the number is negative) 2. Von Newman rounding The strategy is to always set the least significant bit to 1; e.g., internally the IEEE mantissa (implied leading 1) is carried with two extra bits and has the form 1.dddd ... lee least significant bit

guard bits (rounding and shift protect)

1.ddd ... d000 and 1.ddd ... d001 round up to 1.ddd ... d1 1.ddd ... d101 and 1.ddd ... d111 round down to 1.ddd ... d1 ie., half the time rounding is up and the other half it is down. 3. True rounding is the opposite of Von Neuman rounding 1.dddd ... 1 if ee=11 or 10 (≥ 2) 1.dddd ... lee 1.dddd ... 0 if ee=00 or 01 (< 2) Note that this simply requires assigning the first guard bit at end of computation to be the least significant bit. Other considerations: Addition/subtraction can add to precision loss; for example, 1.1112 – 1.1102 = 0.0012 = 1.000 × 2-310 has operands with 4 significant figures and a result that has only 1. If significant figures disappeared via earlier computations in obtaining the operands, then data which could be present in the final result has been lost. This suggests that the results of extended calculations in floating point should be carried in the highest precision format available, a function of programming rather than hardware.

Page 100

Rules for processing floating point numbers: Multiplication: The format for floating point numbers (-1) × 2(

- )

× 1.

is implicitly multiplicative, so determining the result requires • XOR the sign bits • Add the exponents: Since (-) + (-) = (+) - 2 the hardware approach is Add the biased exponents obtained from the IEEE representation and subtract the bias • Multiply the mantissas, including the implied 1, and round the result; if the first bit of each mantissa in the IEEE format is 1, decrement the exponent by 1 (corresponds to floating the binary point left by 1) Division: (-1) × 2(

- )

× 1.

(-1) × 2(

- )

× 1.

= (-1)(⊕) × 2(

- )

× 1.)/(1.

so the procedure is • XOR the sign bits • Subtract the biased exponents obtained from the IEEE representation and add the bias • Divide the mantissas, including the implied 1, and round the result; if the dividend is less than the divisor, increment the exponent by 1 (corresponds to floating the binary point right by 1) – Note: if the dividend is less than the divisor, the result is less than 1 and a normalization step is needed; however, the worst case scenario is a dividend of 1.02 and a divisor of 1.11 ... 12 which is greater than (1/2)10 and so normalization still only moves by 1 position. Addition/Subtraction: these are easily implemented for integers, but require a good bit more attention for floating point. Addition/Subtraction: •



Increment the smaller exponent to match the larger one and shift its mantissa (including the implied 1) to the left by the increment amount. Note that this is the opposite of normalizing. Process addition/subtraction according to the signs of the two values, round, and normalize the result.

Page 101 Register transfer logic: A computing device generally transforms data from one form to another over a series of steps. This is a characteristic of finite state automata, so in its concept a computing device is a (large) finite state machine. It is impractical to concoct a monolithic finite state automaton to describe a computer, so its architecture is instead described in terms of components and their interfaces. We have now seen how to construct sequential circuits that are large memory modules. We also have seen how specialized memory elements called registers can be used to provide the operands for data manipulation techniques such as arithmetic operations, comparison operations, shift operations, and the like. The results of such an operation, if not done directly on the register (such as happens with shift), can be captured in a target register. Conceptually, it appears wise to view memory and manipulation of data in different contexts, one for the storage and retrieval of data, and the other for performing operations on data. Registers are used to hold data retrieved from memory (or data ready to be stored in memory), where it can be accessed for data manipulation needs. Data can be easily moved from register to register; for example, to load two registers providing the operands for an adder circuit, or moved from a target register to a register designed to hold data ready to be stored in memory (ie., a register whose outputs are connected to memory data lines). Register transfer logic organizes registers in a manner which provides means of moving data among registers for purposes of applying various data manipulations to the data contained within them. A register transfer architecture provides an abstracting realization of register transfer logic, conceptualizing data transfer and control in the following manner: Memory I/O

data registers connected along a bus

clock

status control input

signals

control module

(has its own internal working registers)

f e e d b a c k

control output

Page 102 There may be more than one control module deployed. Control signals may need to be generated from outside the control module or passed on to other modules. The clock synchronizes control and data elements and may be suspended by a control signal (e.g., to allow asynchronous transfer of data to or from memory). A register transfer language (RTL) provides a means for instantiating control modules and register elements for accomplishing a task. Transfer/control statements are executed sequentially with the clock. There is no standard register transfer language, but basic elements can be represented using the following notation: 1. data manipulation • transfer (assignment operator): A ← B copy (transfer non-destructively) the contents of register B to register A • access: A[i] access bit i of register A • operators: +, ⊕, ∧, =, ... apply bit-wise across either selected bits, or whole register 2. control • conditional execution: () example: (C) A ← B if condition C=1, the transfer occurs, if C=0 it does not • branch: →[⋅ + ... + ⋅] the next step is changed to the first one in the branch statement having a true condition; if no conditions are true, don’t branch (proceed to next step) Each step can have both data manipulation and control parts. expressed on one line are assumed to be parallel.

Transfers

Example: assuming register bits are numbered left to right starting from 0, then A[0]←0, A[1]←A[0], A[2]←A[1], A[3]←A[2] is a right shift by 1 of the bits referenced in register A (ie., each bit is copied to its neighbor before it is reset). Example:

a fragment of a sequence of RTL steps

Step_1: A ← B (transfer B →[A[0]⋅Step_3] (if A[0]=1, __ Step_2: A ← A (complement Step_3: C ← A __ (transfer A C receives either A or A depending on

to A) branch to Step-3) A) to C) the value of A[0].

An RTL program simply describes next state behavior, and so is a more abstract way to describe a circuit than can be accomplished using state diagrams. Operations (such as floating point arithmetic) which can be done in circuitry using sequential logic are one example of the kinds of circuitry that may be best described in RTL.

Page 103 Example: Consider the RTL sequence C0: A ← B C1: A[0]←A[1], A[1]←A[2], A[2]←A[3], A[3]←A[0] →[SEQ⋅C0] C2: A[0]←A[0]+ A[1], A[2]←A[2]+ A[3] C3: A[1]←A[0] ⊕ A[1], A[3]←A[2] ⊕ A[3] →[C0]

(left circular shift)

The steps in the control sequence correspond to states, and so can be represented by using 2 flip-flops. The overall circuit then has the appearance: input data (B) control input

control signals

D1

SEQ

control combina tional logic (TBD)

D

Q1

D

C0

D0

2 1

1 of 4

transfer combinational logic (TBD)

C1 C2

D

C3

O0 A[0]

O1

A[1]

O2 A[2]

Q0 D

ck

O3

A[3]

feedback Since 6 flip-flops are needed to describe the control logic and provide the 4-bit data register A, if a state diagram approach was employed, the circuit would require 26=64 states! The remaining work is to fill in the two combinational circuits noted as TBD. The program sequence occurs as follows: current control state

next control state SEQ=0 SEQ=1

C0

C1

C1

C1

C2

C1

C2

C3

C3

C3

C0

C0

o u t p u t d a t a

Page 104

Control combinational logic: From the control state transitions we can determine the control combinational logic using sequential circuit design, starting from a state diagram as follows:

C1

0,1

0

1

C0

C2

0,1

C3

0,1

Q1 Q1n

Q2n

Q1

Q0

Q1 Q0 SEQ

C0

0

0

0

0

0

0

1

C1

0

1

0

0

1

0

1

C2

1

0

0

1

0

1

0

C3

1

1

0

1

1

0

0

1

0

0

1

1

1

0

1

1

1

1

1

0

0

0

1

1

1

0

0

Q0SEQ 00

01 0

1

2

1 4

5

6

7

1

1

_ _ ____ __ __ D1 = Q1Q 0 + Q 1Q0 S E Q Q0SEQ 00 Q1 0

1

1

1

01 0

11

Q1 D1 SEQ

D0

10

1

3

2

5

7

6

1 1 _ __ D0 = Q 0

The circuit is then

Q0

10 3

0

4

_ __

11 1

Page 105 Transfer combinational logic: There are 4 transfer statements, each of which requires its own combinational logic and each of which must be activated when its control signal (C0,C1,C2,C3 ) is raised. This is handled by using an AND gate with each control signal to activate/deactivate the appropriate transfer. input data (B)

C0 C1 A0+A1

D

O0

A[0]

C2

C0 C1

C0 C1

A0⊕A1

A[1]

C3

C2

C0

C3

C1 A2+A3

D

O1

D

O2

A[2]

C2

o u t p u t d a t a

C0 C1 A2⊕A3

D

O3

A[3]

C3

ck feedback

Transfer combinational circuit Register Transfers Required C0: C1: C2: C3:

A ← B A[0]←A[1], A[1]←A[2], A[2]←A[3], A[3]←A[0] A[0]←A[0]+ A[1], A[2]←A[2]+ A[3] A[1]←A[0] ⊕ A[1], A[3]←A[2] ⊕ A[3]

(left circular shift)

Page 106 UNF RTL: A Register-Transfer Language Simulator High-level programming languages are usually portable across multiple environments, because they are designed to be used at a level of abstraction above physical implementation. They also tend to have a large user base. In contrast, RTL implementations (even more so than machine and assembly languages) tend to be tailored for a specific manufacturer’s needs; ie., there is no standard RTL. Elsewhere defined RTL circuit modules can also be employed (in the manner of subprograms) if there is a language context in which they are described. UNF RTL is an implementation of an RTL for a simulated machine environment. It has its own syntax and semantics, and can be used to verify register-transfer functionality for microcode-level algorithms. It does not incorporate any timing capabilities, which would normally be desirable in an implementation to be used for actual computer circuit construction. We will illustrate its functionality via a series of programs describing sequential circuits (including ones for specialized arithmetic). I. UNF RTL: Basic Structure An RTL program consists of the following three sections: 1. DEFREG - define registers. 2. DEFBUS - define buses. 3. Control section bracketed by BEGIN and END. For example, DEFREG: REG1(16) ** REG1 is a 16 bit register REGISTER2(8) ** REGISTER2 is an 8 bit register ACC(32) ** ACC is a 32 bit register DEFBUS: MAINBUS(32) ** MAINBUS is a 32 bit bus LASTBUS(8) ** LASTBUS is an 8 bit bus BEGIN: ... ** Register transfer and manipulation statements. ... END: It is assumed that a transfer from one register to another does not require explicit representation of a bus structure. Defined buses are assumed to have a bus sense register to maintain any value transferred onto the bus. The purpose of having buses is to support communication among separately defined modules by explicitly representing the data path. II. UNF RTL: Naming the Registers and Buses DEFREG, DEFBUS, BEGIN, END, and the operator names (see next section) are reserved words. Register and bus names must start with an upper-case letter and may have up to twenty upper-case alphabetic

Page 107 and numeric characters. The number enclosed in parentheses indicates the number of bits in the register being declared or the path width of the bus being defined (number of bits). For example, REG123XYZ(32) defines a register with the name REG123XYZ having 32 bits (bits 0,1,2,...,31). Bits in a register or bus are indexed from left to right beginning with 0. III. UNF RTL: Labels, Conditional Execution, Conditional Branch, Merge Statements in the control section (between the BEGIN and END brackets) may optionally start with a label and/or a condition. : () Examples: label



condition

RTL statement

L1: (X[15 16] LEQ 1 0) X[0 TO 7] SETREG X[0 TO 7] SUB Y M23: REG1 SETREG REG2 A[3 4] SETREG B[2 2] Labels follow the same formation rules as those used for naming registers and buses. A label is terminated with a colon. A condition is an expression involving current contents of registers and buses and should evaluate to either 1 or 0. The statement following the condition is executed if the condition evaluates to 1, otherwise it is ignored. Statements without a "pre-condition" are executed when encountered. In addition to the conditional execution discussed above, there is a conditional branching capability. The syntax is as follows: :() BRANCH (;)(;)(;) ... (;) The execution of the BRANCH statement is conditioned on if present. If the BRANCH statement is executed, the (;) pairs are considered from left to right and the first condition to evaluate to 1 causes a BRANCH to the corresponding label. If none of the conditions evaluate to 1, then execution proceeds to the next sequential line. An unconditional branch is provided to simulate a merging of control signals. The syntax is: :() MERGEAT Examples: BRANCH (SC ANEQ 0; L1) (X[0];L2) MERGEAT TOP IV. UNF RTL: Assignment Statements, Register Transfer, Expressions Assignment statements simulate transfer of bit strings between registers and buses. SETBUS SETREG The expression on the right of the SETREG or SETBUS command indicates processing of current contents of registers and/or buses,

Page 108 the result of which is transferred to the register or bus named on the left hand side of the SETREG or SETBUS command. For example, LASTBUS SETBUS REG34 indicates that the contents of REG34 are to be sent to LASTBUS; REG8 SETREG LASTBUS means that the current set of signals on the LASTBUS is to be copied to REG8; REG9[7 8] SETREG R12[4 9] OR BUS10[2 30] specifies that the sub-register REG9[7 8] (bits 7 and 8 of REG9) is to receive the result of bit-wise OR'ing the contents of the subregister R12[4 9] and sub-bus BUS10[2 30]. An expression may be formed by applying the following rules: 1. A binary vector is a term; e.g., 1 0 1 1 1 1 0 0 2. A register name or a bus name is a term. 3. A sub-register or a sub-bus is a term; e.g., R1[0 4 6 7] or BUS9[28 29 30 31] 4. Concatenation of terms is a term (binary vectors must be enclosed in parentheses when involved in concatenation); concatenation is indicated by using a comma "," between terms; e.g., R1[4 5 6],(1 0 0 1 1),BUS27[16 17] 5. A term (as defined in 1 through 4) is an expression. 6. An expression enclosed in parentheses is a term. 7. is an expression. 8. is an expression. Two reserved bus names (INBUS, OUTBUS) are used for simulated I/O. Expressions using these bus names provide simulated input (with prompt - from keyboard) and output (to screen with optional MESG text, if desired), their syntax is 9. INBUS '' Either of the statements REG1 SETREG INBUS 'enter an 8 bit integer' REG5[0 TO 7] SETREG INBUS 'enter 8 bits' first sends the prompt message to the display, then accepts user input from the keyboard. 10. OUTBUS SETBUS MESG '' where MESG is a reserved word, optionally included along with its '' to specify the addition of the to the display; e.g., OUTBUS SETBUS REG3 MESG 'this is reg3' appends the message text to the display of the contents of REG3. NOTE: INBUS is read only.

OUTBUS is write only.

Page 109 V. UNF RTL: Operators A list of dyadic (requiring two operands) and monadic (requiring one operand) operators follows: Dyadic Operators: Standard Boolean Logic Operations OR, AND, NAND, NOR, XOR, COINC For example, 1 0 1 1 NOR 0 0 1 0 results in 0 1 0 0 Logical and Arithmetic Shifts (Left and Right), Rotate (Circular Shift, Left and Right) LLSHIFT, RLSHIFT, LASHIFT, RASHIFT, LROTATE, RROTATE For example, 3 RLSHIFT 1 0 1 1 1 0 0 1 0 results in 0 0 0 1 0 1 1 1 0 3 RASHIFT 1 0 1 1 1 0 0 1 0 results in 1 1 1 1 0 1 1 1 0 3 RROTATE 1 0 1 1 1 0 0 1 0 results in 0 1 0 1 0 1 1 1 0 Two's Complement Arithmetic ADD, SUB, MUL, DIV For example, 0 0 1 1 1 MUL 0 0 1 1 0 results in 0 0 0 0 1 0 1 0 1 0 Logical (Unsigned) Compare and Arithmetic (Signed) Compare LGT, LLT, LGE, LLE, LEQ, LNEQ AGT, ALT, AGE, ALE, AEQ, ANEQ For example, 1 1 0 0 1 0 LLE 0 0 0 1 0 1 results in 0 1 1 0 0 1 0 ALE 0 0 0 1 0 1 results in 1 String Manipulation FIRST, LAST For example, 4 FIRST 1 0 1 1 0 0 0 1 results in 1 0 1 1 4 LAST 1 0 1 1 0 0 0 1 results in 0 0 0 1 Reformat of User Input under INBUS decTOtwo, hexTOtwo For example, 8 decTOtwo -5 results in 1 1 1 1 1 0 1 1 8 hexTOtwo A9 results in 1 0 1 0 1 0 0 1 Monadic Operators: Standard Boolean Logic Operations NOT For example, NOT 1 0 1 0 1 1 results in 0 1 0 1 0 0 Increment by 1, Decrement by 1 INCREMENT, DECREMENT For example, INCREMENT 0 1 0 0 1 results in 0 1 0 1 0 DECODE, ENCODE, twosCMPL, ZERO, twoTOdec, twoTOhex DECODE performs the function of a 1 of 2n decoder so DECODE 1 1 0 gives 0 0 0 0 0 0 1 0 (activating bit number 6 of the 8 bits) ↑ (bit 6) ENCODE is the inverse of decode so ENCODE 0 0 0 0 0 1 0 0 results in 1 0 1

Page 110 twosCMPL simply forms the 2's complement of its argument so twosCMPL 1 1 1 0 1 results in 0 0 0 1 1 ZERO returns a string of 0's of the given length so ZERO 5 results in 0 0 0 0 0 twoTOdec converts 2's complement to a decimal value for an address or output; e.g., twoTOdec 1 1 1 0 1 returns -3 twoTOhex converts a binary string into hexadecimal notation; eg., twoTOhex 1 1 1 0 1 returns 1D VI.

Evaluation of Conditions

A condition is either an expression (as defined in section IV) or two expressions connected by one of the comparison operators. A condition may appear as a "pre-condition" (in front of any statement) or as the first component in a (;) pair. If a condition takes the form of an expression without a comparison operator, it should evaluate to a 1 or 0. If a logical comparison operator is used, the resulting bit strings on both sides of the comparison operator are treated as unsigned integers in making the comparison. Arithmetic comparisons treat the operands under the assumption they are in 2's complement representation.

Page 111 UNFRTL Examples Generic RTL example of a simple register transfer sequence C0: A ← B C1: A[0]←A[1], A[1]←A[2], A[2]←A[3], A[3]←A[0] (left circular shift) →[SEQ⋅C0] C2: A[0]←A[0]+ A[1], A[2]←A[2]+ A[3] C3: A[1]←A[0] ⊕ A[1], A[3]←A[2] ⊕ A[3] →[C0] UNFRTL program providing an implementation of the sequence [0] RtlSIMPLX [1] DEFREG: [2] SEQ(1) [3] A(4) [4] B(4) [5] DEFBUS: [6] BEGIN: smultip [7] C0:B SETREG INBUS 'Enter 4 bit B input' [8] SEQ SETREG INBUS 'Enter SEQ value' [9] A SETREG B [10] C1:A SETREG 1 LROTATE A [11] OUTBUS SETBUS A MESG 'Register A left rotated by 1 - ' [12] BRANCH(SEQ;C0) [13] C2:A[0 2] SETREG ((A[0] ADD A[1]), (A[2] ADD A[3])) [14] OUTBUS SETBUS A MESG 'Register A with A[0 2] added - ' [15] C3:A[1 3] SETREG ((A[0] XOR A[1]), (A[2] XOR A[3])) [16] OUTBUS SETBUS A MESG 'Register A with A[1 3] XORed - ' [17] MERGEAT C0 [18] END: Lines 9, 10, 13, 15 are the statements providing the actual register transfer specified by C0, C1, C2, C3

Page 112 Signed multiply: Architecture: 3 n-bit registers X, A, Y 1-bit register SGN (A,Y) can be treated as a single 2n bit register for shifting ±

SGN X register

shift counter

ADD ↓ 0→













A register X = multiplicand,



Y0 →

Y = multiplier,

Y register A = accumulator

The sign of the product is first determined by (sgn(X) ⊕ sgn(Y)) and stored in SGN. X and Y are changed to their absolute values so that the arithmetic only has to deal with positive integers. Basic procedure for multiplying (positive) integers X and Y: Clear A X ← Y ← REPEAT IF Y0 = 1 A ← A + X ENDIF Shift (A,Y) right by 1 bit UNTIL there have been n shifts When done, the product will be in (A,Y). The 2's complement form is then produced according to the sign value found in SGN.

Page 113 UNFRTL program for implementing the procedure for multiplication with extensions for accomodating the sign; X and Y are assumed to be 2's complement sign + 7 integers. [0] RtlSMULTIPLY [1] DEFREG: [2] AY(16) [3] X(8) [4] SC(8) [5] SGN(1) [6] DEFBUS: [7] BEGIN: [8] AY[0 TO 7] SETREG ZERO 8 ** Clear accumulator A [9] SC SETREG 0 0 0 0 1 0 0 0 ** Shift counter initially 8 [10] X SETREG 8 INBUS 'Enter multiplicand (8 bit 2''s complement)' [11] AY[8 TO 15] SETREG 8 INBUS 'Multiplier (8 bit 2''s)' [12] SGN SETREG X[0] XOR AY[8] ** Set sign bit for the product [13] BRANCH(NOT X[0]; CKY) [14] X SETREG twosCMPL X ** change sign of X if X < 0 [15] CKY:BRANCH(NOT AY[8]; L) ** and do likewise for Y [16] AY[8 TO 15]SETREG twosCMPL AY[8 TO 15] [17]** Accumulate in A if rightmost bit of Y=1 (recall: AY[15] ≡ Y0) [18] L:(AY[15]) AY[0 TO 7] SETREG AY[0 TO 7] ADD X [19] OUTBUS SETBUS AY MESG 'REG AY ' [20] AY SETREG 1 RLSHIFT AY ** Shift AY right [21] OUTBUS SETBUS AY MESG 'shf AY ' [22] SC SETREG DECREMENT SC ** Decrement seq counter [23] BRANCH(SC ANEQ 0; L) ** Repeat if shift counter = / 0 [24] AY SETREG 1 RLSHIFT AY ** Shift to clear the sign bit [25] BRANCH(NOT SGN;D) [26] AY[0 TO 15] SETREG twosCMPL AY[0 TO 15] [27] D:OUTBUS SETBUS AY[0 TO 15] MESG 'PRODUCT ' [28] OUTBUS SETBUS (twoTOdec AY[0 TO 15]) MESG '(base 10)' [29] END: Execution trace: RtlSMULTIPLY (input data is -10 and 11) Enter multiplicand (8 bit 2's complement): Multiplier (8 bit 2's): 0 0 0 0 1 0 1 1 REG AY ( 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 1 shf AY ( 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 REG AY ( 0 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 shf AY ( 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 0 REG AY ( 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 0 shf AY ( 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 REG AY ( 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 1 shf AY ( 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 REG AY ( 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 shf AY ( 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 REG AY ( 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 shf AY ( 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 REG AY ( 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 shf AY ( 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 REG AY ( 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 shf AY ( 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 PRODUCT ( 1 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 (base 10) ( -110 )

1 1 1 1 0 1 1 0 ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) )

Page 114 Booth's method for multiplying 2's complement integers: Architecture: 3 n-bit registers X, A, Y a 1-bit register P (A,Y,P) can be treated as a single 2n+1 bit register for shifting X register

shift counter

ADD/SUB ↓









A register







Y0 Y register



X = multiplicand, Y = multiplier, P = prior bit from multiplier

P0

→ P

A = accumulator,

In contrast to the "Signed Multiply" procedure, Booth's method requires no independent consideration of the sign of the multiplicand and multiplier. The basic procedure for multiplying 2's complement integers X and Y is as follows: Clear A X ← Y ← P ← 0 REPEAT CASE (Y0,P0) = 1 0: A ← A - X (Y0,P0) = 0 1: A ← A + X ENDCASE Shift (A,Y,P) right arithmetically by 1 bit UNTIL there have been n shifts When done, the product will be in (A,Y). Remark: The first time a 1 appears in position Y0, X will be subtracted. If the next value to appear in Y0 is 0, X will then be added. Because of the shift, the effect is equivalent to having added 2X at the preceding step, which then has the combined effect over the two steps of adding 2X - X = X. If the next value to appear in Y0 had been 1, and then 0 following that, then two shifts would take place before adding X, yielding a combined effect of 4X - X = 3X over the three steps (note that multiplying by 1 1 is the same as multiplying by 3, so adding 3X is exactly what is desired). Thus, the procedure produces the desired outcome for patterns in the multiplier of 0 1 0, 0 1 1 0, 0 1 1 1 0, ... allowing us to conclude that it will work in general. Note that the procedure works regardless of sign. If the multiplier is negative, its lead bits are 1's, and so the procedure simply winds out with a series of

Page 115 shifts once it gets into the leading 1's of the multiplier. Similarly, if the multiplier is positive, its lead bits are 0's and the procedure likewise winds out with a series of shifts once it gets into the leading 0's of the multiplier. Trace of Booth's method:

8 bit registers, (-11)10 x (19)10 =

-20910

X = 1 1 1 1 0 1 0 1 A

Y

0 0 0 0 0 0 0 0

0 0 0 1 0 0 1 1

P 0

1 0

(-X: 0 0 0 0 1 0 1 1) 0 0 0 0 1 0 1 1 A

0 0 0 1 0 0 1 1 Y

0 0 0 0 0 1 0 1

1:0 0 0 1 0 0 1

-X = 0 0 0 0 1 0 1 1

0



subtract

P

(A ← A - X)

and then

1

shift right 1 (arithmetic) 1 1

0 0 0 0 0 0 1 0

1 1:0 0 0 1 0 0

1

A

1 1:0 0 0 1 0 0 Y

1 1 1 1 1 0 1 1

1 1 1:0 0 0 1 0

shift right 1 (arithmetic)

0 1

(+X: 1 1 1 1 0 1 0 1) 1 1 1 1 0 1 1 1



1



add

P

(A ← A + X) and then

0

shift right 1 (arithmetic) 0 0

1 1 1 1 1 1 0 1

1 1 1 1:0 0 0 1

0

A

1 1 1 1:0 0 0 1 Y

0 0 0 0 0 1 0 0

0 1 1 1 1:0 0 0

0

A

0 1 1 1 1:0 0 0 Y

1 1 1 1 1 1 0 0

1 0 1 1 1 1:0 0



subtract

P

(X ← X - 1)

and then

1

shift right 1 (arithmetic) 0 1

(+X: 1 1 1 1 0 1 0 1) 1 1 1 1 1 0 0 1

shift right 1 (arithmetic)

1 0

(-X: 0 0 0 0 1 0 1 1) 0 0 0 0 1 0 0 0



1



P

add

(X ← X + 1) and then

0

shift right 1 (arithmetic) 0 0

1 1 1 1 1 1 1 0

0 1 0 1 1 1 1:0

0



shift right 1 (arithmetic)

0 0 1 1 1 1 1 1 1 1

0 0 1 0 1 1 1 1

0

Product ( 1 1 1 1 1 1 1 1 0 0 1 0 1 1 1 12 = -20910)



shift right 1 (arithmetic)

Page 116 UNFRTL program for implementing Booth's procedure for multiplication; X and Y are assumed to be 2's complement sign + 7 integers. [0] RtlBOOTHMULT [1] DEFREG: [2] AYP(17) [3] X(8) [4] SC(8) [5] DEFBUS: [6] BEGIN: [7] AYP[0 TO 7]SETREG ZERO 8 ** Clear accumulator A [8] SC SETREG 0 0 0 0 1 0 0 0 ** Shift counter initially 8 [9] X SETREG 8 INBUS 'Enter Multiplicand (8 bit 2''s complement)' [10] AYP[8 TO 15]SETREG 8 INBUS 'Multiplier (8 bit 2''s)' [11] AYP[16]SETREG 0 ** initialize P to 0 [12] OUTBUS SETBUS(twoTOdec X)MESG '(base 10 Multiplicand)' [13] OUTBUS SETBUS(twoTOdec AYP[8 TO 15])MESG '(base 10 Multiplier)' [14]** Cases: (recall: AY[15] ← Y0 and AY[16] ← P0) [15] L:(AYP[15 16] LEQ 1 0)AYP[0 TO 7]SETREG AYP[0 TO 7]SUB X [16] (AYP[15 16] LEQ 0 1)AYP[0 TO 7]SETREG AYP[0 TO 7]ADD X [17] OUTBUS SETBUS AYP MESG 'REG AYP ' [18] AYP SETREG 1 RASHIFT AYP ** right arithmetic shift [19] OUTBUS SETBUS AYP MESG 'shf AYP ' [20] SC SETREG DECREMENT SC ** Decrement shift counter [21] BRANCH(SC ANEQ 0;L) ** Repeat if shift counter = / 0 [22] D:OUTBUS SETBUS AYP[0 TO 15]MESG 'PRODUCT ' [23] OUTBUS SETBUS(twoTOdec AYP[0 TO 15])MESG '(base 10)' [24] END: Execution trace: RtlBOOTHMULT (input data is -11 and 19) Enter 8 bit Multiplicand: 1 1 1 1 0 1 0 1 Enter 8 bit Multiplier : 0 0 0 1 0 0 1 1 (base 10 Multiplicand) ( -11 ) (base 10 Multiplier) ( 19 ) REG AYP ( 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 0 shf AYP ( 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 REG AYP ( 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 shf AYP ( 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 REG AYP ( 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 0 1 shf AYP ( 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 0 REG AYP ( 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 0 shf AYP ( 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 REG AYP ( 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 shf AYP ( 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 1 REG AYP ( 1 1 1 1 1 0 0 1 0 1 1 1 1 0 0 0 1 shf AYP ( 1 1 1 1 1 1 0 0 1 0 1 1 1 1 0 0 0 REG AYP ( 1 1 1 1 1 1 0 0 1 0 1 1 1 1 0 0 0 shf AYP ( 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 0 0 REG AYP ( 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 0 0 shf AYP ( 1 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 0 PRODUCT ( 1 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 ) (base 10) ( -209 )

) ) ) ) ) ) ) ) ) ) ) ) ) ) ) )

Page 117 Restoring and non-restoring division: Architecture: 3 n-bit registers: A, X, Y 1-bit sign registers SGNQ and SGNR (A,X) can be treated as a single 2n-bit register for shifting. Y register

shift counter

ADD/SUB ↓









A register





↓ ←

X register

└ → ─ ─ ─ → ─ ─ ─ → ─ ─ ─ → ─ ─ ─ → ─ ─ ─ → ─ ─ ─ → ─ ─ ─ → X0 ±

SGNQ

±

SGNR

sign rules: = * + sign() = sign() ⊕ sign() sign() = sign() ⊕ sign() (for instance,

6/-5 = -1 r 1; -6/-5 = 1 r -1)

Using these rules, the sign of the quotient is stored in SGNQ and that of the remainder in SGNR. The basic procedure for restoring division, positive integers X and Y is as follows: Clear A X ← Y ← REPEAT Shift (A,X) left by 1 bit A ← A - Y IF A < 0 A ← A + Y /* "restore" A */ X0 ← 0 /* set least significant bit of X */ ELSE X0 ← 1 ENDIF UNTIL the register has been shifted n times When the algorithm terminates, register A has register X has (register Y, the , is unchanged) At this point, the values in SGNQ and SGNR are used to establish the correct 2's complement form for the quotient and the remainder.

Page 118 Trace of restoring division: 8 bit registers, (74)10/(25)10 = 210 r 2410

ShiftL Sub Y

ShiftL Sub Y

ShiftL Sub Y

ShiftL Sub Y

ShiftL Sub Y

ShiftL Sub Y

ShiftL Sub Y

ShiftL Sub Y

A 0 A 0 A 1 A 0 A 0 A 1 A 0 A 0 A 1 A 0 A 0 A 1 A 0 A 0 A 1 A 0 A 0 A 1 A 0 A 0 A 0 A 0 A 0 A 1 A 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 1 1 1 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 0 0 1 1 0 0 0

X 0 1 0 0 1 0 1 0 X 1 0 0 1 0 1 0 ? X 1 0 0 1 0 1 0 ? X 1 0 0 1 0 1 0:0 X 0 0 1 0 1 0:0 ? X 0 0 1 0 1 0:0 ? X 0 0 1 0 1 0:0 0 XGr 0 1 0 1 0:0 0 ? X 0 1 0 1 0:0 0 ? X 0 1 0 1 0:0 0 0 X 1 0 1 0:0 0 0 ? X 1 0 1 0:0 0 0 ? X 1 0 1 0:0 0 0 0 X 0 1 0:0 0 0 0 ? X 0 1 0:0 0 0 0 ? X 0 1 0:0 0 0 0 0 X 1 0:0 0 0 0 0 ? X 1 0:0 0 0 0 0 ? X 1 0:0 0 0 0 0 0 X 0:0 0 0 0 0 0 ? X 0:0 0 0 0 0 0 ? X 0:0 0 0 0 0 0 1 X :0 0 0 0 0 0 1 ? X :0 0 0 0 0 0 1 ? X :0 0 0 0 0 0 1 0

Y = 0 0 0 1 1 0 0 1 -Y = 1 1 1 0 0 1 1 1

← A < 0; set vacated bit to ← 0 (and restore A) (: marks quotient so far) ← A < 0; set vacated bit to ← 0 (and restore A)

← A < 0; set vacated bit to ← 0 (and restore A)

← A < 0; set vacated bit to ← 0 (and restore A)

← A < 0; set vacated bit to ← 0 (and restore A)

← A < 0; set vacated bit to ← 0 (and restore A)

← A > 0; set vacated bit to ← 1

← A < 0; set vacated bit to ← 0 (and restore A)

Page 119

UNFRTL program for implementing the procedure for restoring division with extensions for accomodating the sign; X and Y are assumed to be 2's complement sign + 7 integers. [0] RtlRESTORING [1] DEFREG: [2] AX(16) [3] Y(8) [4] SC(8) [5] SGNQ(1) [6] SGNR(1) [7] DEFBUS: [8] BEGIN: [9]** Initialize first half of AX register to zeroes [10] AX[0 TO 7] SETREG ZERO 8 [11]** Initialize shift counter to 8 [12] SC SETREG 0 0 0 0 1 0 0 0 [13] AX[8 TO 15] SETREG 8 INBUS 'Enter 8 bit dividend (2''s comp)' [14] Y SETREG 8 INBUS 'Enter 8 bit divisor (2''s comp)' [15] SGNQ SETREG AX[8] XOR Y[0] [16] SGNR SETREG SGNQ XOR Y[0] [17] BRANCH(NOT AX[8]; CKY) [18] AX[8 TO 15] SETREG twosCMPL AX[8 TO 15] [19] CKY:BRANCH(NOT Y[0]; L) [20] Y SETREG twosCMPL Y [21] L:AX SETREG 1 LLSHIFT AX [22] SC SETREG SC SUB 0 0 0 0 0 0 0 1 [23] OUTBUS SETBUS AX MESG 'shf AX ' [24] AX[0 TO 7] SETREG AX[0 TO 7] SUB Y [25] OUTBUS SETBUS AX MESG 'SUB ' [26] BRANCH(AX[0 TO 7] ALT 0; RESTORE) [27] AX[15] SETREG 1 [28] OUTBUS SETBUS AX MESG 'set 1 ' [29] MERGEAT TST [30] RESTORE:AX[0 TO 7] SETREG AX[0 TO 7] ADD Y [31] OUTBUS SETBUS AX MESG 'restore ' [32] TST:BRANCH(SC AGT ZERO 8; L) [33] BRANCH(NOT SGNR; CKQ) [34] AX[0 TO 7] SETREG twosCMPL AX[0 TO 7] [35] CKQ:BRANCH(NOT SGNQ; D) [36] AX[8 TO 15] SETREG twosCMPL AX[8 TO 15] [37] D:OUTBUS SETBUS AX[8 TO 15] MESG 'QUOTIENT ' [38] OUTBUS SETBUS (twoTOdec AX[8 TO 15]) MESG '(base 10)' [39] OUTBUS SETBUS AX[0 TO 7] MESG 'REMAINDER' [40] OUTBUS SETBUS (twoTOdec AX[0 TO 7]) MESG '(base 10)' [41] END:

Page 120

Execution trace: RtlRESTORING (input data is 74 and 25) Enter 8 bit dividend (2's comp): Enter 8 bit divisor (2's comp): shf AX ( 0 0 0 0 0 0 0 0 1 0 0 SUB ( 1 1 1 0 0 1 1 1 1 0 0 restore ( 0 0 0 0 0 0 0 0 1 0 0 shf AX ( 0 0 0 0 0 0 0 1 0 0 1 SUB ( 1 1 1 0 1 0 0 0 0 0 1 restore ( 0 0 0 0 0 0 0 1 0 0 1 shf AX ( 0 0 0 0 0 0 1 0 0 1 0 SUB ( 1 1 1 0 1 0 0 1 0 1 0 restore ( 0 0 0 0 0 0 1 0 0 1 0 shf AX ( 0 0 0 0 0 1 0 0 1 0 1 SUB ( 1 1 1 0 1 0 1 1 1 0 1 restore ( 0 0 0 0 0 1 0 0 1 0 1 shf AX ( 0 0 0 0 1 0 0 1 0 1 0 SUB ( 1 1 1 1 0 0 0 0 0 1 0 restore ( 0 0 0 0 1 0 0 1 0 1 0 shf AX ( 0 0 0 1 0 0 1 0 1 0 0 SUB ( 1 1 1 1 1 0 0 1 1 0 0 restore ( 0 0 0 1 0 0 1 0 1 0 0 shf AX ( 0 0 1 0 0 1 0 1 0 0 0 SUB ( 0 0 0 0 1 1 0 0 0 0 0 set 1 ( 0 0 0 0 1 1 0 0 0 0 0 shf AX ( 0 0 0 1 1 0 0 0 0 0 0 SUB ( 1 1 1 1 1 1 1 1 0 0 0 restore ( 0 0 0 1 1 0 0 0 0 0 0 QUOTIENT ( 0 0 0 0 0 0 1 0 ) (base 10) ( 2 ) REMAINDER ( 0 0 0 1 1 0 0 0 ) (base 10) ( 24 )

0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

0 1 0 0 0 1 ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) )

Page 121 The basic procedure for non-restoring division, positive integers X and Y is as follows: Clear A X ← Y ← Shift (A,X) left by 1 bit A ← A - Y REPEAT IF A < 0 X0 ← 0 /* set least significant bit of X */ Shift (A,X) left by 1 bit /* Remark: below */ A ← A + Y ELSE X0 ← 1 Shift (A,X) left by 1 bit A ← A - Y ENDIF UNTIL the register has been shifted n times (including the initial shift) IF A < 0 X0 ← 0 A ← A + Y /* the only time A is "restored" */ ELSE X0 ← 1 ENDIF When the algorithm terminates, register A has register X has (register Y, the , is unchanged) Remark: Shifting and then adding Y as done above is equivalent to adding Y (to restore A), then shifting, and then subtracting Y as done in the restoring algorithm. This is true because a left shift has the effect of multiplying by 2; i.e., 1. in the non-restoring algorithm when A < 0: Y was subtracted initially; a shift has the effect that 2Y is now subtracted; adding Y leaves the effect of a single subtraction of Y for the next pass (without having to restore!). 2. in the restoring algorithm when A < 0: Y was subtracted initially; Y is added back to restore; a shift now has no effect on the Y arithmetic; Y must now be explicitly subtracted for the next pass.

Page 122 Trace of non-restoring division: 8 bit registers, (74)10 / (25)10

Shift Sub Y

ShiftL Add Y

ShiftL Add Y

ShiftL Add Y

ShiftL Add Y

ShiftL Add Y

ShiftL Add Y

ShiftL Sub Y

A 0 A 0 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 1 A 0 A 0 A 0 A 1 A 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 1 1 1 1 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 0 0 1 1 0 0 0

X 0 1 0 0 1 0 1 0 X 1 0 0 1 0 1 0 ? X 1 0 0 1 0 1 0 ? X 1 0 0 1 0 1 0:0 X 0 0 1 0 1 0:0 ? X 0 0 1 0 1 0:0 ? X 0 0 1 0 1 0:0 0 X 0 1 0 1 0:0 0 ? X 0 1 0 1 0:0 0 ? X 0 1 0 1 0:0 0 0 X 1 0 1 0:0 0 0 ? X 1 0 1 0:0 0 0 ? X 1 0 1 0:0 0 0 0 X 0 1 0:0 0 0 0 ? X 0 1 0:0 0 0 0 ? X 0 1 0:0 0 0 0 0 X 1 0:0 0 0 0 0 ? X 1 0:0 0 0 0 0 ? X 1 0:0 0 0 0 0 0 X 0:0 0 0 0 0 0 ? X 0:0 0 0 0 0 0 ? X 0:0 0 0 0 0 0 1 X :0 0 0 0 0 0 1 ? X :0 0 0 0 0 0 1 ? X :0 0 0 0 0 0 1 0

Y = 0 0 0 1 1 0 0 1 -Y = 1 1 1 0 0 1 1 1

← A < 0; set vacated bit to ← 0 (: marks quotient so far) ← A < 0; set vacated bit to ← 0

← A < 0; set vacated bit to ← 0

← A < 0; set vacated bit to ← 0

← A < 0; set vacated bit to ← 0

← A < 0; set vacated bit to ← 0

← A > 0; set vacated bit to ← 1

← A < 0; set vacated bit to ← 0 and restore A

Page 123 UNFRTL program for implementing the procedure for non-restoring division with extensions for accomodating the sign; X and Y are assumed to be 2's complement sign + 7 integers. [0] RtlNRESTORE [1] DEFREG: [2] AX(16) [3] Y(8) [4] SC(8) [5] SGNQ(1) [6] SGNR(1) [7] DEFBUS: [8] BEGIN: [9]** Initialize first half of AX register to zeroes [10] AX[0 TO 7] SETREG ZERO 8 [11]** Initialize shift counter to 8 [12] SC SETREG 0 0 0 0 1 0 0 0 [13] AX[8 TO 15] SETREG 8 INBUS 'Enter 8 bit Dividend' [14] Y SETREG 8 INBUS 'Enter 8 bit Divisor ' [15] SGNQ SETREG AX[8] XOR Y[0] [16] SGNR SETREG SGNQ XOR Y[0] [17] (AX[8]) AX[8 TO 15] SETREG twosCMPL AX[8 TO 15] [18] (Y[0]) Y SETREG twosCMPL Y [19] AX SETREG 1 LLSHIFT AX [20] OUTBUS SETBUS AX MESG 'shf AX ' [21] AX[0 TO 7] SETREG AX[0 TO 7] SUB Y [22] OUTBUS SETBUS AX MESG 'SUB ' [23] L:SC SETREG SC SUB 0 0 0 0 0 0 0 1 [24] BRANCH(SC ALE ZERO 8; CHK) [25] (AX[0]) MERGEAT ADDY [26] AX[15] SETREG 1 [27] OUTBUS SETBUS AX MESG 'set 1 ' [28] AX SETREG 1 LLSHIFT AX [29] OUTBUS SETBUS AX MESG 'shf AX ' [30] AX[0 TO 7] SETREG AX[0 TO 7] SUB Y [31] OUTBUS SETBUS AX MESG 'SUB ' [32] MERGEAT L [33] ADDY: AX SETREG 1 LLSHIFT AX [34] OUTBUS SETBUS AX MESG 'shf AX ' [35] AX[0 TO 7] SETREG AX[0 TO 7] ADD Y [36] OUTBUS SETBUS AX MESG 'ADD ' [37] MERGEAT L [38] CHK:(AX[0]) MERGEAT REST [39] AX[15] SETREG 1 [40] OUTBUS SETBUS AX MESG 'set 1 ' [41] MERGEAT CKQ [42] REST:AX[0 TO 7] SETREG AX[0 TO 7] ADD Y [43] OUTBUS SETBUS AX MESG 'ADD ' [44] CKQ:(SGNR) AX[0 TO 7] SETREG twosCMPL AX[0 TO 7] [45] (SGNQ) AX[8 TO 15] SETREG twosCMPL AX[8 TO 15] [46] D:OUTBUS SETBUS AX[8 TO 15] MESG 'QUOTIENT ' [47] OUTBUS SETBUS (twoTOdec AX[8 TO 15]) MESG '(base 10)' [48] OUTBUS SETBUS AX[0 TO 7] MESG 'REMAINDER' [49] OUTBUS SETBUS (twoTOdec AX[0 TO 7]) MESG '(base 10)' [50] END:

Page 124 Execution trace: RtlNRESTORE (input data is 74 and 25) Enter 8 bit Dividend: 0 1 0 0 1 0 1 0 Enter 8 bit Divisor : 0 0 0 1 1 0 0 1 shf AX ( 0 0 0 0 0 0 0 0 1 0 0 1 0 1 SUB ( 1 1 1 0 0 1 1 1 1 0 0 1 0 1 shf AX ( 1 1 0 0 1 1 1 1 0 0 1 0 1 0 ADD ( 1 1 1 0 1 0 0 0 0 0 1 0 1 0 shf AX ( 1 1 0 1 0 0 0 0 0 1 0 1 0 0 ADD ( 1 1 1 0 1 0 0 1 0 1 0 1 0 0 shf AX ( 1 1 0 1 0 0 1 0 1 0 1 0 0 0 ADD ( 1 1 1 0 1 0 1 1 1 0 1 0 0 0 shf AX ( 1 1 0 1 0 1 1 1 0 1 0 0 0 0 ADD ( 1 1 1 1 0 0 0 0 0 1 0 0 0 0 shf AX ( 1 1 1 0 0 0 0 0 1 0 0 0 0 0 ADD ( 1 1 1 1 1 0 0 1 1 0 0 0 0 0 shf AX ( 1 1 1 1 0 0 1 1 0 0 0 0 0 0 ADD ( 0 0 0 0 1 1 0 0 0 0 0 0 0 0 set 1 ( 0 0 0 0 1 1 0 0 0 0 0 0 0 0 shf AX ( 0 0 0 1 1 0 0 0 0 0 0 0 0 0 SUB ( 1 1 1 1 1 1 1 1 0 0 0 0 0 0 ADD ( 0 0 0 1 1 0 0 0 0 0 0 0 0 0 QUOTIENT ( 0 0 0 0 0 0 1 0 ) (base 10) ( 2 ) REMAINDER ( 0 0 0 1 1 0 0 0 ) (base 10) ( 24 )

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) )

Page 125 Floating point operations: Floating point operations can likewise be implemented in RTL. We will only illustrate this for floating point add (IEEE 32 bit format). Assume that both numbers are positive (if one is positive and one is negative, then the add operation becomes subtract; if both are negative, then the operation is add with the sign set to negative). Architecture: two 32-bit floating point registers F1,F2 two 2-bit registers GB1,GB2 for guard bits two 2-bit registers IB1,IB2 for manipulating the implied 1 Each of (IB1,F1[9…31],GB1) and (IB2,F2[9…31],GB2) can be treated as a single 27-bit register for addition and shifting. The basic procedure for floating point addition/subtraction, numbers X and Y is as follows: Clear GB1, Clear GB2, Clear IB1, Clear IB2 IB1[1] ← 1, IB2[1] ← 1 /* Set the implied 1’s */ F1 ← F2 ← /* Make F1 the larger of the two numbers in magnitude */ IF F1[1…8] < F2[1…8] Swap(F1,F2) ENDIF IF F1[1…8] = F2[1…8] IF F1[9…31] < F2[9…31] Swap(F1,F2) ENDIF ENDIF /* shift F2’s mantissa to line up the two exponents */ Shift (IB2,F2[9…31],GB2) right by (F1[1…8] – F2[1…8]) bits IF F1[0] = F2[0] /* add mantissas including the extra bits */ (IB1,F1[9…31],GB1) ← (IB1,F1[9…31],GB1) + (IB2,F2[9…31],GB2) ELSE /* subtract */ (IB1,F1[9…31],GB1) ← (IB1,F1[9…31],GB1) - (IB2,F2[9…31],GB2) IF (IB1,F1[9…31],GB1) is all zeroes Clear F1 /* special case when result is 0 */ Exit ENDIF ENDIF /* normalize by shifting until IB1 has the implied 1 */ IF IB1[0] = 1 Shift (IB1,F1[9…31],GB1) right by 1 F1[1…8] ← F1[1…8] + 1 /* increment the exponent */ ENDIF normalization WHILE IB1[0] = 0 Shift (IB1,F1[9…31],GB1) left by 1 F1[1…8] ← F1[1…8] - 1 /* decrement the exponent */ ENDWHILE F1[31] ← GB1[0] /* round the result */ Remark: specialed cases (eg., exponent all 0’s) are not considered.

Page 126 UNFRTL program for implementing the procedure for implementing 32-bit IEEE add/subtract (decided by the signs of the numbers) [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49]

RtlADDIEEE * Addition or subtraction determined by signs DEFREG: * Mantissa has 2 for overflow, 1 for implied 1, 2 as guard bits MANT1(28) MANT2(28) * Exponent has 2 for overflow; special cases are not handled EXP1(10) EXP2(10) SGN1(1) SGN2(1) DEFBUS: BEGIN: * Clear extra bits and set implied ones (assumes normal case) MANT1[0 1 2 26 27]SETREG 0 0 1 0 0 MANT2[0 1 2 26 27]SETREG 0 0 1 0 0 EXP1[0 1]SETREG 0 0 EXP2[0 1]SETREG 0 0 SGN1 SETREG 1 INBUS '1st numner - enter 1 bit sign 1' EXP1[2 TO 9]SETREG 8 INBUS 'Enter 8 bit exponent 1' MANT1[3 TO 25]SETREG 23 INBUS 'Enter 23 bit mantissa 1' SGN2 SETREG 1 INBUS '2nd number - enter 1 bit sign 2' EXP2[2 TO 9]SETREG 8 INBUS 'Enter 8 bit exponent 2' MANT2[3 TO 25]SETREG 23 INBUS 'Enter 23 bit mantissa 2' OUTBUS SETBUS 'Adding' OUTBUS SETBUS SGN1,'-',EXP1[2 TO 9],'-',MANT1[3 TO 25] OUTBUS SETBUS SGN2,'-',EXP2[2 TO 9],'-',MANT2[3 TO 25] * Swap operands if necessary BRANCH(EXP1[2 TO 9]LGT EXP2[0 TO 9];OK) BRANCH(EXP1[2 TO 9]LLT EXP2[0 TO 9];DS) BRANCH(MANT1[3 TO 25]LGE MANT2[3 TO 25];OK) DS:EXP1 SETREG EXP1 XOR EXP2 EXP2 SETREG EXP1 XOR EXP2 EXP1 SETREG EXP1 XOR EXP2 MANT1 SETREG MANT1 XOR MANT2 MANT2 SETREG MANT1 XOR MANT2 MANT1 SETREG MANT1 XOR MANT2 * Sign of F1 determines sign of result SGN1 SETREG SGN1 XOR SGN2 SGN2 SETREG SGN1 XOR SGN2 SGN1 SETREG SGN1 XOR SGN2 * Line up exponents and add or subtract mantissas OK:MANT2 SETREG(twoTOdec EXP1 SUB EXP2)RLSHIFT MANT2 (SGN1 LEQ SGN2)MANT1 SETREG MANT1 ADD MANT2 (SGN1 LNEQ SGN2)MANT1 SETREG MANT1 SUB MANT2 BRANCH(MANT1 LNEQ ZERO 28;NORM) SGN1 SETREG 0 EXP1 SETREG ZERO 10 MERGEAT D * If necessary shift implied 1 into position

Page 127 [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63]

NORM:(NOT MANT1[1])MERGEAT L MANT1 SETREG 1 RLSHIFT MANT1 EXP1 SETREG INCREMENT EXP1 * If necessary normalize to get 1 into implied position L:(MANT1[2])MERGEAT RND MANT1 SETREG 1 LLSHIFT MANT1 DECREMEMT EXP1 MERGEAT L * Round the result RND:MANT1[25]SETREG MANT1[26] D:OUTBUS SETBUS EXP1[2 TO 9]MESG 'Exponent - ' OUTBUS SETBUS MANT1[3 TO 25]MESG 'Mantissa - ' OUTBUS SETBUS SGN1,'-',EXP1[2 TO 9],'-',MANT1[3 TO 25] END:

Example Usage: Operand 1 = 232.125 =2 11101000.001 normalized is 1.11010000012 × 27 sign = 0 biased exponent = 7 + 127 = 134 = 100001102 mantissa = 11010000010000000000000 (implied leading 1) Operand 2 = -1.03125 =2 -1.00001 normalized is -1. 000012 × 20 sign = 1 biased exponent = 0 + 127 = 127 = 011111112 mantissa = 00001000000000000000000 (implied leading 1) RTL simulator results: Name of Machine: RtlADDIEEE Processing RtlADDIEEE specifications and statements. 1st numner - enter 1 bit sign 1: 0 Enter 8 bit exponent 1: 1 0 0 0 0 1 1 0 Enter 23 bit mantissa 1: 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2nd number - enter 1 bit sign 2: 1 Enter 8 bit exponent 2: 0 1 1 1 1 1 1 1 Enter 23 bit mantissa 2: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Adding 0 - 1 0 0 0 0 1 1 0 - 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 - 0 1 1 1 1 1 1 1 - 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Exponent - ( 1 0 0 0 0 1 1 0 ) Mantissa - ( 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 ) 0 - 1 0 0 0 0 1 1 0 - 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 Normal Termination Verification: 1.11010000012 × 27 = 1.1101000001002 × 27 = 232.12500 -1.000012 × 20 = -0.0000001000012 × 27 = -1.03125 1.1100111000112 × 27 = 231.09375 1.1100111000112×27 = 11100111.000112 = 128+64+32+4+2+1+1/16+1/32 = 231.09375

Page 128 Computer organization: Computer hardware is generally organized in three component areas: 1. Memory 2. Central Processing Unit (CPU) 3. Peripherals Memory has already been described, and RTL provides the means for describing the CPU. The CPU components are given by the following:

General Hardware Organization Example (separate I/O:memory bus and CPU:memory bus)

INPUT DEVICE

IC Instruction Counter

SR Status Register

MEMORY (shared data and instructions)

CONTROL Instruction Decode

IR Instruction Register

MAR Memory Address Reg

Data Path

OUTPUT DEVICE

MDR

Working Registers (Accumulator, Index)

Memory Data Reg

Communications Link

CPU Y Z ALU {Arithmetic and Logic Unit including support registers)

Central Processing Unit

Page 129 The memory and CPU can operate asynchronously (in effect, each has its own clock). For peripheral (I/O) devices, the CPU sends a signal to the I/O device and then a device controller independent of the CPU takes care of data transfer, which is to/from a designated memory location called an I/O buffer. When the transfer is complete, the I/O device sends a signal to the CPU to notify it that the buffer is now ready for access. If the CPU is performing an operation that requires the transfer to complete, then the CPU will need to pause operation (essentially by holding its clock at zero) until the completion signal is received is received. The user sees this as the system “hanging”. Hence, the (including which when causes the The • • • •

CPU needs to be able to signal resource controllers memory). The signal can be as simple as taking a bit to 1, dropped back to 0 (by the external resource controller) CPU clock to resume.

elements within the CPU consist of A control unit An Arithmetic and Logic Unit (ALU) Registers for user program data (working registers) Registers for managing user programs

The control unit: The control unit has circuitry for signaling data transfers to and from memory. Two registers are employed for controlling the transfer. 1. The memory address register (MAR), which has the memory address for the transfer 2. The memory data register (MDR), which has the data to be transferred to memory, or which receives the data transferred from memory. The memory data register is attached to the CPU-memory bus as a “bus sense register” accessible by both memory and CPU. The memory control unit also must be able to access the MAR to determine the memory address to use. The Von Neumann architecture stipulates that programs and data reside in the same memory area. The process of transferring a machine language instruction into the CPU is called an instruction fetch. The control unit has an internal working register, the instruction register (IR), where it stores the instruction fetched. The IR is attached to a circuit that decodes the instruction to – extract the instruction operation – determine the memory address the instruction is to act on The instruction address can then be transferred to the MAR to initiate the transfer of the needed data to the MDR. Arithmetic and Logic Unit: The arithmetic and logic unit contains circuits such as those described using RTL and combinational logic for useful computational work, such as arithmetic operations and logical comparison. The ALU is signaled as to which operation’s output is to be captured in its output register.

Page 130 Registers for user program data (working registers): Conceptually, user program data must be placed in a work area where it can be retained to permit cascading operations that characterize complex arithmetic expressions. Each of these registers is usually attached to the CPU bus, which effectively limits their number (the IBM 360 architecture provides 16, for example). Some of these register may have special purposes; for example, an accumulator is a register in which the ongoing outcome of a computation is accumulated; an index register whose value is added to the instruction address to allow stepping through a sequence of memory locations (usually representing a data table). The idea is to do as much work in the CPU as possible to avoid transfers to and from memory; for example, a swap sequence through a temporary memory location Read m1 Write T Read m2 Write m1 Read T Write m2

Transfer m1 to T Transfer m2 to m1 Transfer T to m2

requires 6 Read/Write operations, whereas using working registers R1 and R2 Read m1 to R1 Read m2 to R2 R1←R1⊕R2 CPU time for these is negligible compared to Read/Write R2←R1⊕R2 R1←R1⊕R2 Write R1 to m1 Write R2 to m2 requires only 4 (plus no temporary location is needed). Registers for managing user programs: The Instruction Counter (IC) has the address of the machine language instruction to fetch after the instruction currently in the IR is finished. When an instruction is fetched, the IC is updated to point to the address of the next instruction. A branch instruction is simply one that can modify the value in the IC. The Status Register (SR) is set by the control unit to flag results of comparisons, overflow conditions, and the like. To facilitate register, CPU elements bus structures, with access to a bus blocks that allow a register’s value registers to which it is transferred organization has a structure such as

are connected along one or more controlled by 3-state logic onto the bus and select the from the bus. A single bus the following:

Page 131

Single Bus CPU Organization CPU BUS SR

I C

Instr Decoder & Operand Address

I R

M D R

M A R

R0

R1

CCC

Rn

Y

a

ALU

b

c

Z

CCC Address Lines

Data Lines

Control Lines for ADD, SUB etc.

CPU-memory bus

This organization provides means for moving values in and out of selected registers. No more than 1 register can be gated onto the bus at any one time, or the signals will conflict. Any number of registers can simultaneously be loaded from the bus, however. Binary operations take one operand from register Y and the other from the bus. Registers such as the IR do not have a transfer to the bus because there is no reason to be transferring their contents back out of the register. The IC is not in this category, because its contents must eventually be transferred to the bus and into the MAR as part of fetching the next instruction to execute. The Register-Bus gating is as follows:

Register-Bus Gating R

R0

0 in (enable)

out (enable)

R0

Data transfer example: Y R0 R0

out

, Yin

Here R0 in enables the R0 Write/Enable on the clock signal. activates a 3-state logic connection from R0 to the bus.

R0

out

Page 132 From the diagram we can determine the gating signals. Memory I/O control signals for Read from memory and Write to memory are also needed, along with ALU commands. A microcounter is used to select the current line of microcode from a table and control signals are needed to selectively reset the counter. Gating signals: ICout, ICin, Addrout, IRin, MARin, MDRout, MDRin, R0 out, R0 in, ..., Ri out, Ri in, ..., Yin, Zout, Zin Memory I/O control signals: Read, Write, WaitM (hold CPU clock at 0 until memory read is done) ALU commands: Add, Sub, Set carry (to 1), ShiftR Y, ShiftL Y, Clear Y, Compare, GT, LT, EQ, NE Micro counter control signals: End A line of microcode consists of a sequence of bits which give the values for each of the gating signals, the Memory I/O control signals, the ALU commands, and the micro counter control signals (1 means the signal is active, 0 means it is inactive). Microcode organized in this fashion is called horizontal microcode. Several lines of microcode are needed to specify a machine language instruction. A machine language instruction is divided into two parts: 1. the op code (specifies what the instruction is to do) 2. the operand (identifies the location of the data to be acted on) In a basic machine, a machine language instruction occupies a single word of memory. If the word length is 32 and the op code takes 8 bits, then 256 different machine language instructions can be provided. The operand takes the remaining 24 bits. Since operands represent memory addresses, 224 = 16,777,216 different memory locations can be directly addressed. Larger memory address space can be accomodated by using operands that represent relative addresses rather than absolute addresses. Once the instruction is brought in from memory and transferred into the IR, the instruction interpreter can decode the op code part to point the microcounter to the right microprogram; the operand is gated to the bus when Addrout is signaled.

Page 133 Example: Suppose that the machine language instruction ADD0 means “increment the value in R0 by the value pointed to by ”. A register used in this fashion is sometimes called an accumulator. A microcode sequence for the ADD0 instruction is as follows: Instruction fetch

Accumulate

ICout, MARin, Read, Clear Y, Set carry, Add, Zin Zout, ICin, WaitM MDRout, IRin Addrout, MARin, Read Operand fetch R0 out, Yin, WaitM MDRout, Add, Zin Zout, R0 in, End

Each line of microcode represents the signals which are “on”. others are presumed to be “off”.

All

The 1st line of microcode initiates instruction fetch and does the following (simultaneously): • • • • • •

the IC is gated onto the bus and into the MAR (ICout, MARin) memory is signaled to READ the value addressed by MAR into the MDR the ALU’s Y register is cleared to 0 the carry-in for Add is set to 1 the ALU Add circuit is selected (calculating bus + Y + 1 = address of the next instruction) the ALU result is gated into Z.

This all takes place in 1 CPU cycle. The idea is to do as much in each step as possible to minimize the number of CPU cycles required. The 2nd line of the instruction fetch does “housekeeping”, gating the address of the next instruction (as calculated by the 1st line) out of Z and into the IC, after which nothing else can be done until memory releases the Wait signal (Zout, ICin, WaitM). The simplifying assumption here is that each instruction occupies a single word and the machine is “word addressable” (as opposed to “byte addressable”). This speeds up instruction fetch since adding 1 to the IC points the IC to the next instruction. If variable length instructions are to be employed, then the IC increment must wait until the IR is fetched. Moreover, the instruction interpreter must provide the instruction length via a new transfer link (ILout) for the address calculation. To complete the instruction fetch, the 3rd line of microcode gates the retrieved instruction from the MDR onto the bus and into the IR (MDRout, IRin). The same instruction fetch sequence starts the microcode for every machine language instruction!

Page 134 The 4th line begins the operand fetch, where • the operand address determined by the instruction decoder is gated onto the bus and into the MAR (Addrout, MARin) • memory is signaled to READ into the MDR the value addressed by MAR. To set up for the accumulate, on the 5th microcode line • the value in R0 is gated onto the bus and into Y (R0 out, Yin) • the CPU clock is suspended by issuing a Wait signal. For accumulate, the 6th line of microcode becomes active when the Wait signal is released, at which point • the MDR is gated onto the bus, ALU Add is triggered, and the result is captured in Z (MDRout, ADD, Zin). The 7th and final line of microcode finishes the accumulate, where • Z is gated onto the bus and into R0 (Zout, R0 in) • End triggers instruction fetch on the next CPU cycle. Since each line of microcode requires a CPU cycle, the accumulate instruction in this example requires 7 CPU cycles, 3 cycles for instruction fetch and 4 for the accumulate procedure. Designers spend a great deal of effort to make architectural adjustments which serve to reduce the number CPU cycles required by machine language instructions, since each machine language instruction will be executed countless times in the operation of a computer. In particular, since instruction fetch is used for every instruction, it is advantageous that the machine architecture be structured for an instruction fetch requiring as few CPU cycles as possible (hence, they devise strategies such as pipelines, which can be filled while the current instruction is being processed, taking advantage of the predictable nature of instruction fetch). A rule of thumb in writing microcode is that multiple “in” signals are permitted on a line of code, but only one “out” signal. Microprograms: The microcode for a machine language instruction such as ADD0 is called a microprogram. It can be stored in a table accessed by a counter. The instruction fetch is a microprogram in its own right, and for this architecture would occupy the 1st three lines of the table. When the instruction fetch triggers IRin, the instruction interpreter decodes the op code now in the IR to set the counter to point to the microprogram for the machine language instruction. When the last line of the microprogram is accessed, the End signal causes the counter to reset to 0 so the process will repeat, resulting in fetching and processing the next instruction. Instruction fetch assumes that the IC contains the address of the instruction to transfer in from memory, so the question can be asked, “how does it get an address in the first place?”. The answer to this is that the machine has a start/reset button, which forces a hard-wired

Page 135 address value into the IC when it is pressed. This address points to a bootstrap program that has been stored in memory (usually as ROM, so it can’t be altered accidentally), which gets the program flow going for a user. The reset address has to be specified by the CPU designer and the boostrap program has to be provided by the computer manufacturer (on a PC as part of the “ROM BIOS”). Translation from a programming language: Programming languages such as C “compile” user programs into machine language instructions such as ADD0. For example, the C statement x = x + 3; could compile to the 3 machine language statements: LOAD0 ADD0 STORE0 The address values are those assigned by the C compiler (which is just another program). A compiler translates program statements to machine code, and among other things assigns an address location to each constant and variable, initializing memory for each constant as part of the process. Before the compiled program can be run, it has to be located in memory so that addresses match those assigned by the C compiler. A piece of system software called a “loader” handles this. If LOAD0 means “transfer the value at to R0” and STORE0 means “transfer the value in R0 to memory at ” the above three statements 1. transfer x to R0 2. add 3 to R0 3. transfer R0 to x and x has been incremented by 3. Microcode for LOAD0 and STORE0 is as follows: LOAD0 : Addrout, MARin, Read, WaitM MDRout, R0 in, End STORE0 : R0 out, MDRin Addrout, MARin, Write, WaitM, End

Page 136 Branching: The Status Register (SR) has condition code (CC) bits which are set by the ALU Compare instruction. The CC value in conjunction with the ALU instructions EQ, LT, GT, and NE provide the means for conditional branches. When one of the following combinations are in effect: • CC=1 0 and LT • CC=1 1 and GT • CC=0 - and EQ • CC=1 - and NE ALU input from line "b" is routed to ALU output "c". Otherwise, for LT, GT, EQ, and NE ALU input from line "a" is routed to ALU output "c". Hence, LT, GT, EQ, and NE cause either the bus (input a) or register Y (input b) to be routed to the ALU output (output c) depending on the current CC value in the status register (SR). LT, GT, EQ, and NE allow the construction of conditional branch instructions. The routing patterns are summarized by the following diagram: ALU routing for CC (Compare result) EQ NE LT GT Y → Z bus → Z bus → Z bus → Z = 0 0 (bus = Y) 0 1 (bus = Y) Y → Z bus → Z bus → Z bus → Z 1 0 (bus < Y) bus → Z Y → Z Y → Z bus → Z ≠ 1 1 (bus > Y) bus → Z Y → Z bus → Z Y → Z Compare sets the first CC bit to 0 if the value in Y and the value on the bus are equal. If they are unequal, COMP sets the first CC bit to 1 and sets the 2nd bit to specify either bus < Y or bus > Y. Example: Suppose that the machine language instruction BGT means “branch on greater than to the instruction whose location in memory is given by ”. Here the assumption is that the CC bits in SR were set by Compare in an earlier instruction and if they are 1 1, is gated into the IC to replace the address of the next instruction computed during instruction fetch. Microcode is as follows: Set branch options Set branch

Addrout, Yin ICout, GT, Zin Zout, ICin, End

If the value of the CC is 1 1 then GT causes Y to be routed into Z and the branch is taken. For any other CC value, the bus (which has the current IC value) is routed into Z and the branch is not taken.

Page 137 Microcode programming: A microcode programmer establishes the microcode for the machine language instructions that comprise the machine language for a given computing device. The table holding the microcode is sometimes called the control store and resides in the CPU for rapid access via the microcode counter. The End signal provides a 0-cycle “microbranch” to the instruction fetch. Additional microbranches may be provided to permit reuse of microcode sequences in addition to the one for instruction fetch. This is typically the case for microprogrammable machines, which provide means for making (limited) changes to the machine’s control store. Note that the instruction interpreter is already making microbranches that aren’t reflected in the microcode. Machine language instructions are represented by mnemonics such at BGT, BLE, COMP, SUB, MOV12, ADD0, RSHIFT0, J and so forth. Instruction fetch is the same for all instructions, and its CPU cycle consumption adds to the CPU cycles consumed by the instructions microprogram. To this point machine language instructions ADD0, LOAD0, STORE0 and BGT have been described. A sampling of others follows. Other machine language instructions: BLE calls for a branch if the condition is “not BGT”. This could be done by testing for LT and then testing for EQ, but it is better handled by reversing the branch options for BGT. BLE : Set branch options Set branch

ICout, Yin Addrout, GT, Zin Zout, ICin, End

this reverses what what BGT put on the bus and in Y

If CC is 1 1 then GT causes the current IC, which is in Y, to be routed into Z, meaning the branch is not taken. Hence, the branch is taken for “not GT”. Since LE is the same as “not GT” this microcode implements BLE. Compares can always be structured as a comparison with 0 (eg., A > B is the same as A-B > 0). Hence, for a COMP machine language instruction, the strategy can be comparison of the operand with 0. The microcode is then COMP : Addrout, MARin, Read, WaitM MDRout, Clear Y, Compare, End Here the comparator in the ALU is comparing the bus to Y=0, setting the CC accordingly. Since COMP only compares to 0, it becomes the machine language programmer’s responsibility to convert A > B to A-B > 0. Machine language instructions to accomplish this are as follows LOAD0 A SUB0 B STORE0 TEMP COMP TEMP

Page 138 Subtract is not quite the same as add, because the order of operands matters. The normal assumption is that SUB0 means subtract the value at from R0. Assume also that the ALU Sub signal causes the value on the bus to be subtracted from Y. The microcode is SUB0 : Addrout, MARin, Read R0 out, Yin, WaitM MDRout, Sub, Zin Zout, R0 in, End If there is a COMPR0 instruction for comparing R0 to 0, then the machine language program can be improved; eg., it can be shortened to LOAD0 A SUB0 B COMPR0 where now the TEMP memory location has been eliminated. COMPR0 is particularly simple:

Microcode for

COMPR0 : R0

out,

Clear Y, Compare, End

Only 1 CPU cycle is required (other than the CPU cycles for instruction fetch). This is characteristic of “register to register” machine language instructions. Register to register instructions are advantageous because they require no memory access (other than instruction fetch). If MOV12 means copy R1 to R2, then its microcode is MOV12 : R1

out,

R2

in,

End

It should be noted that as with COMPR0, only 1 CPU cycle is needed. In the architecture as described, registers have to be explicitly identified by the mnemonic for machine language instructions, which is why neither COMPR0 nor MOV12 required an operand. To identify registers dynamically, for example, in an instruction such as MOV , a mechanism has to be added to the architecture for dynamically matching and to working registers. If is specified by 4 bits (providing 16 possible register ids), an 8 bit operand is sufficient to specify the pair (,). The architectural addition needed is an 8-bit register R attached to the bus to serve the purpose of identifying and . The microcode for a machine language instruction using register operands then has to load R with the ids of the registers to be used. To see how this might work, suppose Ra in is a signal that triggers a decoder which accesses the first 4 bits of the register identifier and Rb in does the same for the second 4 bits. In

Page 139 other words, If 0001 in the 1st 4 bits of R identifies R1, then R1 in is triggered by the 4 to 1 decoder for an Ra in signal. To identify the register pair and , only the 1st 8 bits of the MOV instruction’s operand field have to be set; eg., for MOV 1,2 the operand bits are 00010010. If MOV , means “copy the contents of to , then microcode for MOV , is: Addrout, Rin Ra out, Rb in, End

(move the immediate value to R)

Addrout in this case is putting register ids on the bus rather than an address (the operand is the immediate value). This is called immediate addressing (meaning the address for the operand is the immediate location on the instruction itself). The machine language code for copying R1 to R2 is then MOV 1,2 or for copying R2 to Ro is MOV 2,0 This kind of enhancement characterizes the release of an “extended” version of an existing computer architecture, where upward compatibility is being sought. If RSHIFT0 means to shift R0 right by the value of the operand, then assuming the ShiftR Y command shifts Y by the value given by the bus, the microcode sequence is RSHIFT0 : R0 out, Yin Addrout, ShiftR, Zin Zout, R0 in, End Note that in this case the address portion of the instruction is treated as a number. This is another example of immediate addressing, where it is the immediate value, rather than a value in memory, that is of interest. Immediate addressing provides an easy means for establishing values in registers. For example, if the instruction LOAD0I means “load the immediate value (namely ) into R0 then the specific value encoded on the instruction is loaded into R0. More explicitly, LOAD0I 31 provides means to initialize R0 (in this case to the integer 31). If the address given by the instruction is the address of the data item in memory, then the addressing is called direct addressing. In some circumstances, it is desirable that the address part of the instruction point to a memory location that holds the address of the desired data item. This is called indirect addressing. For example, BGTN could specify a branch on greater than, not to the address, but to

Page 140 the address stored at the address. This is useful for providing a table of “jump addresses” that point to different routines to be invoked depending on machine state. In contrast to BGT, a memory access is required to get the address to jump to: BGTN : Set branch options Set branch

Addrout, MARin, Read, WaitM MDRout, Yin ICout, GT, Zin Zout, ICin, End

get the indirect address

A jump instruction is an unconditional branch and is very simple to construct: J :

Addrout, ICin, End

Index register: To process a table, it is useful to have an index register whose value is automatically added onto the operand before it is transferred to the MAR. For example, suppose that the instruction ADD0X means “accumulate in R0 indexed by R2”; ie., R2 is designated to be the index register. for ADD0X provides a “base address” for a table. A specific entry in the table is obtained by adding R2 to the base address. The microcode for ADD0X is: ADD0X : Adjust address by index

Accumulate

Addrout, Yin R2 out, Add, Zin Zout, MARin, Read R0 out, Yin, WaitM MDRout, Add, Zin Zout, R0 in, End

Operand fetch

If indirect addressing is combined with indexing, a jump table can be easily processed. A jump table typically holds the addresses of the programs that a process must select from among dynamically (eg., an operating system service routine to process an interrupt flag raised by a device controller). If JNX means “jump to the machine language instruction located at the address specified by ”, then the microcode is JNX : Adjust address by index

Addrout, Yin R2 out, Add, Zin Zout, MARin, Read, WaitM MDRout, ICin, End

get the indirect address and move it to the IC

The instruction operand provides the base address for the table. Incrementing the operand by the index (line 2) changes the address to a location further along in the table. This is the address of the value to be retrieved and it is sent to the MAR to retrieve the table entry (line 3). The retrieved value (line 4) is then transferred to the IC so that the instruction executed next will be the one whose memory location is stored in the table.

Page 141 If the table entries are the addresses of programs, then the effect of the jump is to start up the program whose address was retrieved from the table. Logically, we have the following hierarchy: immediate address

op code

operand immediate value

direct address

indirect address

These examples demonstrate why it is desirable to have machine language instructions that utilize immediate or indirect addressing. Suppose that designated bits within the opcode specify if addressing is to be immediate, direct, or indirect. The additional micro counter control signals can be added which respond to these. Let Endi reset the micro counter to 0 if the designated bits specify immediate addressing. Let Endd reset the micro counter to 0 if the designated bits specify direct addressing. With these additional micro branches, a single microprogram can serve for all 3 addressing modes. A typical way to specify the addressing mode is to append a qualifier to the instruction mnemonic; eg., LOAD0* for immediate, LOAD0 for direct, and LOAD@ for indirect. Microcode for LOAD0 that uses this capability is as follows: LOAD0 : Addrout, MARin, R0 in, Read, WaitM, Endi MDRout, MARin, R0 in, Read, WaitM, Endd MDRout, R0 in, End On line 1, the program ends with the immediate value () in R0 in if addressing is immediate. On line 2, the program ends with the value directly fetched from memory transferred into R0. Otherwise the retrieved indirect value is transferred into R0 in line 3. The Read in line 1 is “anticipatory” in case addressing is direct. If addressing is direct, the line 1 transfer into R0 is overridden by line 2. Likewise, the Read in line 2 is anticipatory in case addressing is indirect, and if so the line 2 transfer into R0 is overridden by line 3.

Page 142 Simplified Instructional Computer (SIC): A widely used architecture for instruction in systems architecture and programming is the SIC machine described by Beck (Systems Software: An Introduction to Systems Programming – Addison-Wesley). This machine incorporates the kind of CPU elements that have been discussed and its machine language can be easily represented using the microprogramming techniquess just covered. First of all the SIC hardware organization can be represented by almost the same block diagram exhibited earlier.

SIC Hardware Organization (separate I/O:memory bus and CPU:memory bus)

INPUT DEVICE

IC Program Counter

SW Status Word

MEMORY (shared data and instructions)

CONTROL Instruction Decode

IR Instruction Register

MAR

OUTPUT DEVICE

A X L Working Registers (Accumulator, Index, Link)

Memory Address Reg Data Path

MDR Memory Data Reg

Communications Link

CPU Y Z ALU {Arithmetic and Logic Unit including support registers)

Central Processing Unit

Page 143 Note that only minor modifications are needed in the structure of this diagram. In essence, the working register set has been specified to consist of an accumulator A, an index register X, and a link register L. The Instruction Counter and Status Register are renamed and no other changes are necessary. For the CPU organization, the diagram becomes

Single Bus CPU Organization for Implementing SIC CPU BUS SW

P C

Instr Decoder & Operand Address

I R

M A R

M D R

L

X

A

Y

a

ALU

b

c

Z

CCC Address Lines

Data Lines

Control Lines for ADD, SUB etc.

CPU-memory bus

The SIC ADD instruction accumulates in A instead of R0. COMP is exactly as described already, BGT is named JGT, and so forth. Load instructions are dubbed LDA, LDX, and LDL, respectively. Shift is not provided in the basic SIC machine, but is available under the extended version (SIC/XE), which requires means for dynamically identifying registers as related earlier. Arithmetic is available only for register A for the basic SIC machine, but is available for all registers under the extended architecture. Immediate and indirect addressing are available only for the SIC/XE version of the machine. We’ve already seen the reason for having a register designated to provided indexing. The link register is one whose use includes automatic provision of the return address when jumping to a subroutine. In the basic SIC machine, this is called JSUB, which simply jumps to the address given by its operand after setting the link register. The microcode is as follows: JSUB : ICout, Lin Addrout, ICin, End Instruction fetch has already produced the address of the instruction immediately following the JSUB (the so-called return address). It is a simple matter to transfer it to register L before changing the IC to cause the jump to the subroutine.

Page 144 The counterpart to JSUB is RSUB, which jumps to the value given by register L. Its microcode is: RSUB:

Lout, ICin, End

Note that RSUB requires no operand. If the subroutine also invokes JSUB, then it must first save register L and restore it before executing RSUB, or the subroutine will return to itself! High level languages provide a stack structure so that the programmer doesn’t have to worry about this detail (the current value of L is pushed onto the stack as part of the subroutine call, and is popped off of the stack as part of the subroutine return). Architecture enhancements: By separating the bus into an input bus and an output bus (which can be selectively “tied” together), CPU cycles can be saved. Consider

Dual Bus CPU Organization OUTPUT BUS SR

I C

Instr Decoder & Operand Address

I R

M A R

M D R

R0

R1

CCC

Rn

Y

a

ALU

b CCC

INPUT BUS Address Lines

Data Lines

Control Lines for ADD, SUB etc.

CPU-memory bus

Register Z has been eliminated in favor of the “input bus”. Recall for the single bus architecture we had ADD0 : ICout, MARin, Read, Clear Y, Set carry, Add, Zin Instruction Z fetch out, ICin, WaitM MDRout, IRin Operand Addrout, MARin, Read fetch R0 out, Yin, WaitM MDRout, Add, Zin Accumulate Zout, R0 in, End

c

bus tie Bt

Page 145 For the dual bus modification, ADD0 becomes Instruction fetch

Accumulate

Btenable, ICout, MARin, Read ICout, Clear Y, Set carry, Add, ALUout, ICin, WaitM Btenable, MDRout, IRin Operand Btenable, Addrout, MARin, Read fetch Btenable, R0 out, Yin, WaitM MDRout, Add, ALUout, R0 in, End

Note that altering the architecture in this manner reduces the CPU cycles for ADD0 by 1. Basically, splitting the bus eliminates the need for register Z. Two “out” signals are now allowed (as is the case for both line 2 and line 6), but only if they are on different buses and the buses are not tied. Another technique that can be used is to utilize both halves of the clock cycle (half the time it is high, the other half low). By dividing the circuitry into components that activate on logic high (positive logic) and components that activate on logic low (negative logic), speed may be almost doubled. For example, the two lines of microcode Addrout, Btenable, MARin, Read R0 out, Btenable, Yin, WaitM

Operand fetch

do not have any register transfer signal conflicts (an in signal for the same register on each line), so the first could be accomplished while clock is high and the second while clock is low. This can be done by setting up the microcode table as two tables, the first of which provides microcode signals on clock high and the second on clock low. Each half of the table is addressed via the microcounter. Hence, table entries that have the same address represent consecutive lines of microcode. A microprogram will now need to have an even number of lines, with End appearing on the last line, even if it is the only control signal on the line. Under this strategy, the same register cannot be set on consecutive lines of microcode. Also, since a register sets up when its flip-flop CK lines go low, an “out” for a register should not be on the line immediately following an “in”. For these reasons, either the first entry or the second entry of the pair may need to be left (all signals off). To illustrate, if this approach is used, single bus ADDR0 becomes: Instruction fetch

Accumulate

ICout, MARin, Read, Clear Y, Set carry, Add, Zin Zout, ICin, WaitM MDRout, IRin Operand Addrout, MARin, Read fetch R0 out, Yin, WaitM MDRout, Add, Zin Zout, R0 in End

CPU time to execute the microprogram is reduced from 7 CPU cycles to 5. For the dual bus scenario, it can be shown that the time can be reduced to 3 CPU cycles with only minor code rearrangement.

Page 146 CPU-memory synchronization: At the microcode level, the CPU can trigger memory Read, Write, and WaitM signals. Circuitry for how the signals are used to synchronize the data transfers is as follows: Read Write Wai tM

If Read or Write is active and memory is busy (i.e., Enable is 1), taking WaitM to 1 disables the CPU clock, effectively putting the CPU to sleep until Mhold is cleared by Enable going to 0.

D

Q CPU Clock

CK Q Clock 1

CPU side Memory side Read Write Mhold

Note: Mhold is set whenever memory is busy and Read or Write goes to 1. Synchronization between the CPU and memory occurs when Enable clears Mhold by going to 0. Memory setup occurs while Enable is 1. The memory action occurs when Enable goes to 0.

set D

D

Q

Q

Mx Enable

CK

CK

Q

Q clear Clock 2 Read=1 WaitM=1 CPU Clock Mhold Mx Enable Clock 2

1

2

Read=1 & WaitM =1

3

4

5

6

Mx follows Mhold on trailing edge of Clock 2. Enable falls to 0 when Mx=1 and Clock 2 rises (clearing Mhold)

Page 147 The timing considerations are given by the timing diagram, which shows two typical cases on the CPU clock line: •



A Read issued on a CPU cycle followed by WaitM issued on the next CPU cycle; eg., Zout, MARin, Read R0 out, Yin, WaitM Both Read and WaitM issued on the same CPU cycle; eg., Addrout, MARin, Read, WaitM

If Mx = 0, Enable holds at 1. If Mx = 1, then when the asynchronous clock signal (Clock 2) rises to 1, Enable falls to 0 and the Mhold ff is cleared, with Mx falling to 0 when the clock falls to 0. Hence, when neither Read or Write signal is active, Enable = 1 and Mhold = 0. This is how the timing diagram starts. In the first case, when Read goes to 1 in the CPU, the Mhold ff is set to 1 and Mx goes to 1 when Clock 2 falls. This in turn takes Enable to 1 and memory sets up MDR based on the value in the MAR. When Clock 2 rises again, Enable falls to 0 (note that memory has had full set-up time as represented by Clock 2) and Mhold is cleared. WaitM has been issued, but has no effect since the CPU clock only responds to Mhold when Clock 1 falls. It should be noted that the CPU clock operates in phase with Clock 1 except when held low by the synchronizing ff. In the case illustrated, no CPU cycles are lost and the MDR is available on the next clock cycle. In the second case, when Mhold rises to 1 (suspending the CPU clock because WaitM = 1), Mx doesn’t rise as quickly to 1, because asynchronous match up of Clock 1 and Clock 2 is in a worst case scenario and Mx only rises when Clock 2 falls again. When Mx does rise to 1, memory has had a full setup period (Clock 2 low) for when the CPU clock resumes in synch with Clock 1. Note that 2 CPU cycles have been lost. Computer architects seek to define means to eliminate these kinds of “wait states” between memory and CPU. This may involve using cache memory (a fast intermediate memory between main memory and the CPU) to reduce clock differences. Of course, when the data is not in cache, the cache has to be reloaded, which may cost some CPU cycles. Another technique is to “pipeline” data into CPU registers, so that in many cases the next item is in the pipeline. The cache can be loaded while the pipeline is being processed, and if more than one pipeline is employed, one pipeline can be loaded while another is being processed. If successive memory retrievals cross wide stretches of memory, then neither caching or pipelining will help (and may actually hinder, because loading them requires time). This is normally not an issue, since typical programs operate within a compact area of memory.

Page 148 Inverting microcode: Microcode can be inverted to form a large logic circuit by examining what microcode signals are “on” at each time step T1, T2, ..., Tn. The microprogram sequences are examined at each of T1, T2, ..., Tn for those signals each microprogram turns on. For example, Zout is on at T2 for every case (since it is in line 2 of instruction fetch). It is also on at T6 for BGT, BLE, RSHIFT0, JNX, at T7 for ADDR0 and BGTN, and at T9 for ADD0X. The Zout signal is then set in the large logic circuit via the combinational equation Zout = T2 + T6 •(BGT+BLE+RSHIFT0+ADD0X+JNX) + T7 •(ADD0+SUBB0+BGTN) + T9 • ADD0X + ... Similarly, ICout is set in instruction fetch and in branch instructions, leading to the combinatorial equation ICout = T1 + T4 •(ADD0+JSUB) + T5 •(BGT+BGTN) + ... Specialized signals such as WaitM and End are also represented in combinatorial equations: WaitM = T2 + T4 •(LOAD0+COMP+BGTN) + T5 •(ADD0+SUB0+STORE0) + T6 • JNX + T7 • ADD0X + ... End = T4 •(COMPR0+MOV12+J) + T5 •(LOAD0+STORE0+COMP) + T6 •(BGT+BLE+RSHIFT0) + T7 •(ADD0+SUB0+BGTN+JNX) + T9 • ADD0X + ... In this manner the combinational logic for setting signals at each point of the instruction counter is described. A block diagram for the CPU as a circuit is then given by:

clock

reset

counter . . .

inhibit

decoder T1

T2

. . .

Tn

ADD0

I R

. . .

Instr Decoder

BGT . . .

Logic circuit to set control signals

SUB0

WaitM

(& Mhold)

. . .

End

. . .

status flags (eg., Mhold)

. . .

condition codes

Page 149 Either approach will control the register transfer requirements specified by the microcode. In contrast to using a generic component which applies microcode from a table to set control signals, the large circuit is “cast in concrete” as a combinational circuit. The gain is in efficiency. The loss is that making changes to the system microcode requires major circuit modification. Modern microprocessors employ microcode tables imbedded in “firmware”, so that a need to make changes to microcode only involves modifying the imbedded table rather than the other circuitry. As a case in point, some years ago when the Intel Pentium was found to have a computational bug in its floating point routines, Intel was able to very quickly issue replacement processors which corrected the problem because the floating point operations were defined by microcode. Vertical vs. horizontal microcode: The microcode as examined to this point has been viewed “horizontally” as a sequence of bits. Manufacturer’s often group logically related signals in much the manner used earlier to dynamically identify registers. For example, a 1 of 16 decoder can be used to select a signal using just 4 bits, a reduction of 12 microcode bits. This is OK so long as no more than 1 of the 16 signals needs to be selected at a time. In particular, since only 1 “out” signal can be selected at a time, all “out” signals could be selected in this fashion. Microcode employing this technique is called vertical microcode. Managing the CPU and peripheral devices: The CPU is the central resource for a computer, and its failure precludes any utilization of the system otherwise. Moreover, whenever the CPU clock has been inhibited, the system is effectively shut down, so steps that reduce the probability of this occurring are advisable. For example, if the CPU sends a signal to a printer and the CPU clock is inhibited until the printer responds, no matter the state of the rest of the system, the computer is effectively shut down until a signal is received from the printer (perhaps the printer has not been turned on, or there is a cable problem). This tactic was commonly employed by earlier computers. Memory-CPU synchronization is always necessary because of the tight coupling between the memory and CPU for instruction fetch, which means a possibility always exists for the synchronization circuitry to inhibit the CPU clock. Tactics such as instruction pipelines and memory caches are used to minimize this possibility. Peripheral devices are not tightly coupled to the CPU, so peripheralCPU synchronization does not have to be directly achieved. The tactic employed is called direct memory access. Direct memory access takes

Page 150 advantage of the fact that memory is not driven by a counter (in contrast to the CPU). For this reason data transfers between a peripheral device and memory can take place without suspending the memory clock to wait for the device to respond. Since peripheral devices operate at considerably slower speeds than either CPU or memory, a number of clock cycles may go by before a device response takes place, during which time there can be continued memory-CPU activity. When using direct memory access, peripheral-CPU synchronization is taken care of indirectly by memory-CPU synchronization, and the CPU clock does not need to be inhibited while waiting for peripheral device response. When a program initiates a peripheral data transfer, the program usually must pause until the transfer has been accomplished. The CPU provides the signals that control a peripheral device’s behavior and there may be a “driver” program that causes the peripheral to step through its physical requirement. A peripheral device usually has its own “controller”, which responds to the signals received from the driver. Regardless of strategy, a program handling a peripheral data transfer will reach a point where it can go no further without a response from the peripheral device. To meet the objective of keeping the valuable resource, the CPU, from being held up by slow peripheral response times, it is evident that means are needed to switch from a waiting program to one ready to run. This is normally accomplished by maintaining multiple programs in memory, devising means both for keeping track of these programs and for switching off to one of them when the currently executing program must pause. This is a primary task of the modern operating system. At the core of the operating system there is a “supervisor” program whose job is simply to manage other programs that are in memory. When a program wants to access a peripheral device it does so by executing a “supervisor call” (SVC) machine language instruction. The supervisor does “housekeeping” (saving the state of the program that executed the SVC), initiates the peripheral data transfer, and turns the CPU over to a new program (via a machine language instruction such as JNX, after restoring the state of the new program). In this way the CPU no longer gets suspended by programs that initiate peripheral data transfers. When a program is suspended, the current machine state (register values, including SR and the IC) must be saved. Note that the microcode for the SVC must save the program’s current IC (in the manner of an RSUB) since starting the supervisor program changes the IC. Also, means must be provided to capture the SR. In making the

Page 151 “context switch” to the new program, the supervisor must restore the machine state of the program being resumed. This information is maintained in state tables that are under control of the supervisor. It is important that the supervisor program periodically resumes execution so that every program in memory gets a turn with the CPU. Since SVC commands for peripheral devices access may occur erratically, a timer is needed so that in the absence of any program executing an SVC, program control returns to the supervisor after a defined period of time has elapsed. This implies that a “timer interrupt” is needed to force a null SVC if a peripheral access has not occurred in the meantime. Both the timer and an interrupt capability represent an added hardware need. At the hardware level, an interrupt is just a signal which when present redirects the End microbranch to branch to a microprogram that captures the IC and starts the supervisor (via a microbranch to the SVC microprogram). The interrupt capability can also be used as the means for a peripheral device to signal that it’s done. When the supervisor program is run in response to an interrupt from a peripheral device, it conducts a context switch and resumes the program that executed the SVC which originated the peripheral device access. To determine the source of an interrupt, the supervisor needs to maintain information to match peripheral devices and programs that have a pending SVC action. For a timer interrupt, the supervisor simply needs to make a context switch to another program that is ready to run. Since the supervisor program should not be interrupted, means are also needed to “mask” interrupts while the supervisor program is executing. A mask is just a (bit) signal which when present keeps an interrupt from manifesting itself; e.g., interrupt mask Masking bits are set by the microcode of the SVC instruction, to be relinquished when the supervisor completes the context switch. The supervisor also must be able to deactivate an interrupt signal it has serviced so that the interrupt won’t immediately manifest itself again on release of the mask. These kinds of considerations are covered in the context of an operating systems course. In addition to providing this capability, the hardware also needs to support a capability of having “privileged”

Page 152 instructions (instructions that can only be used if the privilege signal has been activated – the SVC turns on this signal, in particular, so that the supervisor program can run privileged instructions). Privileged instructions (e.g., direct I/O instructions) are ones reserved for use of operating system software. They typically are instructions whose use in ordinary programs could compromise the operating system’s ability to manage the CPU (eg., using a privileged I/O instruction leads to an interrupt when the I/O operation completes; the supervisor only has the means for handling the interrupt if it is the one issuing the I/O instruction). A +5V commercial microprocessor – the Z80: The Zilog Z80 microprocessor is an 8-bit processor that was first issued in 1976. Running a superset of the Intel 8080 instruction set, the chip was in wide use by 1980, perhaps most notably in the Radio Shack TRS-80, which was the first personal computer made available via a mass distributor, foretelling the future direction computing was to take with desktop machines. The Z80’s advantages (low cost, +5V compatibility) have made it a favorite to this day, although it is now used primarily for embedded applications where processing power is not an issue (eg., device controllers). The features of the Z80 are as follows: 8-bit CPU in a 40 pin package • •

16 address lines 8 data lines



13 control lines



power, ground, clock

158 instructions forming a superset of the Intel 8080 64Kb address space Another interesting feature of the Z80 is that it has a duplicate set of registers to support making a context swap on an interrupt; ie., the registers of the interrupted program do not necessarily have to be saved (however, if a 2nd interrupt can occur, a register save will be needed).

Page 153 The chip pin-outs are as follows:

A0

19

MREQ

A1

20

IORQ

21

RD

22

WR

24

WAIT

25

BUSRQ

23

BUSAK

16

INT

17

NMI

26

Reset

27

M1

28

RFSH HALT

• • •

Addressing (pins 30-40, 1-5)

A15 Z80 D0 D1

Data

• • •

(pins 14,15, 12,7-10,13)

D7 6

29

18 11

CK

GND

+5V

Memory & I/O Control

Bus Control Interrupt Control

Miscellaneous Controls

Bus control enables the CPU to share the data bus with another device. To access the bus, the device signals its request via BUSRQ. When the CPU finishes its current operation it “floats” the address lines, data lines, I/O control and memory control lines, and signals back via BUSAK. The device is responsible for sending an interrupt signal to reactivate the CPU when it is finished with the bus. The MREQ line signals that the MAR is ready for a Read or Write operation. The IORQ line signals that the first 8 bits of the address bus have a valid I/O address for an I/O Read or Write operation. This signal is not used if memory-mapped I/O is being employed. The RD and WR lines apply to both memory and I/O operations. The WAIT line is used by memory or an I/O device to signal the CPU to enter a wait state until the signal is released (memory refresh continues via NOP operations – see below). The INT line is for maskable interrupts (the command set provides the software controls).

Page 154

The NMI line is for non-maskable interrupts. The Reset line resets the internal CPU status and resets the instruction counter to 0. The M1 line is a signal that is output at the start of instruction read (more than one memory fetch is necessary to get the whole instruction). The Z80 allocates extra time to the op code read to provide time for refreshing dynamic memory. During the 2nd half of the opcode read, a counter value is placed on the first 7 address lines for the memory bank in need of refresh and the RFSH signal is raised. The HALT signal stops CPU activity (except that NOPs continue to be executed to maintain memory refresh) – an interrupt is needed for the CPU to resume. The Z80 can be clocked cycle by cycle via the clock input. Many designs of simple Z80 implementations have been devised. following 6-chip design is from Tannenbaum.

The

Data Bus (8 bits)

...

A0 . . A10

A14 A15

button

M1 IORQ RFSH HALT BUSAK

Z 8 0

INT NMI WAIT BUSRQ MEMRQ WR Reset RD

A0 . . A10

f l o a t

A0 . . A10

A0 A1

2K×8 RAM

2K×8 EPROM

PIO

CS

CS

CS

OE

OE

WR

R/W

RD

1

Addressing: A15 A14 [controls are active on signal LOW] 0 = EPROM 1 0 = RAM 1 1 = PIO (memory mapped)

Page 155 The PIO is a (+5V compatible) chip providing parallel I/O ports. EPROM is electrically programmable read only memory, which can be erased using a strong ultraviolet light source and programmed using an EPROM programmer. SRAM is static random access memory, a designation for memory that does not need to be refreshed to maintain its values (ie., it is composed of flip flops). The counterpart, dynamic memory, requires periodic refreshing, and uses a different technology than gate logic. Dynamic memory provides greater capacity for less cost, but at the expense of speed. The design is complete except that a control program is needed for the EPROM (systems software for I/O, including a display device, in particular), a CPU clock is needed, and a power supply is needed (3 Dcell batteries will do). Note that address 0 maps to the EPROM, which is where the Z80 initiates program load on Reset. Representative pricing for the configuration is as follows: Z80 $1.39 PIO (MK 3881) $1.49 7400 chip $0.65 7410 chip $0.292K×SRAM 2016 2K×SRAM $1.39 2716 2K×EPROM $2.25 $7.46

Page 156 INDEX П-notation ........................... 23 Σ-notation ........................... 23 2421 BCD representation .............. 14 2's complement ....................... 10 4-bit parallel adder ................. 40 4-bit parallel subtractor ............ 41 9's complement ....................... 14 Absorption property .................. 18 Accumulator .................... 130, 133 Adder 4-bit parallel adder ............... 40 BCD adder .......................... 42 carry anticipation ................. 40 full adder ......................... 39 half adder ......................... 39 Adders Sequential binary adder ............ 67 Addressing modes direct addressing ............ 139, 141 immediate addressing ......... 139, 141 indirect addressing .......... 139, 141 Alkaline battery ..................... 57 Alternating current .................. 57 ALU ............................. 96, 129 Amperes .............................. 56 AOI gates ............................ 42 Arithmetic and logic unit ....... 96, 129 ASCII ................................ 13 Associative property ................. 17 Barrel Shifter ....................... 77 Base address ........................ 140 Batteries ............................ 57 in series .......................... 57 BCD adder ............................ 42 BCD to 7-segment display decoder/driver ................................... 36 Binary operations ..................... 1 AND ................................. 4 COINCIDENCE ......................... 4 NAND ................................ 4 NOR ................................. 4 One ................................. 2 OR .................................. 4 table of binary operations .......... 3 XOR ................................. 4 Zero ................................ 2 Boolean algebra ...................... 16 absorption property ................ 18 associative property ............... 17 commutative property ............... 17 complement property ................ 18 DeMorgan property .................. 18 Distributive property .............. 17 duality ............................ 17 generalized DeMorgan property ...... 20 idempotent property ................ 18 identity property .................. 18 involution property ................ 18 one ................................ 16 zero ............................... 16 zero and one property .............. 17 Boolean operations ................... 16 for circuits ....................... 16

for sets........................... 16 for truth table logic.............. 16 Booth's method...................... 114 UNF RTL program................... 116 Bootstrap program................... 135 Branch instruction.................. 130 Branching........................... 136 Bus tie............................. 144 Byte............................. 14, 86 gigabyte........................... 14 K-byte............................. 14 megabyte........................... 14 terabyte........................... 14 Canonical forms...................... 22 Canonical product of sums............ 23 Canonical sum of products............ 23 Carry anticipation................... 40 CC.................................... 6 Central Processing Unit............. 128 Character representation............. 13 ASCII.............................. 13 EBCDIC............................. 13 Characteristic table................. 51 Chip select.......................... 43 Circuit design combinational circuits............. 33 Circuit simplification............... 25 circular shift....................... 64 Clear signal......................... 65 Clock asynchronous....................... 72 speed.............................. 88 Combinational circuit analysis....... 25 Combinational circuits design process 33 Common cathode........................ 6 Commutative property................. 17 Comparators.......................... 46 Complement property.................. 18 Computer organization............... 128 Context switch...................... 151 Control store....................... 137 Control unit........................ 129 Coulomb.............................. 56 Counter design....................... 70 Counters............................. 65 Johnson counter.................... 76 mod 2n ripple counter .............. 65 n-stage counter.................... 75 self-starting...................... 74 sequential design.................. 70 shift-register..................... 75 switch-tail counter................ 76 CPU................................. 128 ALU commands...................... 132 arithmetic and logic unit......... 129 arithmetic and working registers.. 129 control unit...................... 129 gating signals.................... 132 index register.................... 140 instruction counter............... 130 instruction register.............. 129 machine language instruction...... 132 managing peripherals.............. 149

Page 157 memory address register ........... 129 memory data register .............. 129 memory-I/O control signals ........ 132 micro counter control signals ..... 132 register-bus gating ............... 131 status register ................... 130 timer interrupt ................... 151 working registers ................. 130 CPU organization dual bus .......................... 144 SIC ............................... 143 CPU-memory synchronization .......... 146 D flip-flop .......................... 58 Data bit .............................. 6 Debouncing a switch .................. 55 Decoder 1 of 2n decoder..................... 43 BCD to 7-segment display ........... 36 Gray to binary ..................... 36 Decoders/demultiplexers .............. 43 DeMorgan property .................... 18 Demultiplexer ........................ 43 Device interrupt .................... 151 Direct addressing .............. 139, 141 Direct memory access ................ 149 Distinguished cell ................... 31 distributive property ................ 17 D-latch .............................. 58 DMA ................................. 149 Don't care cell ...................... 31 Double precision floating point ...... 98 EBCDIC ............................... 13 Enable ............................... 43 End-around carry ..................... 11 EPROM ............................... 155 EPROM memory ......................... 85 Error correcting code ................ 95 Essential prime implicant ............ 31 Even parity .......................... 66 Excess-3 BCD ......................... 14 Excitation controls .................. 61 Extended precision floating point .... 98 Field programmable gate arrays ....... 86 Finite state automaton ............... 66 Flip-flop ............................ 56 Flip-flops ........................... 50 D flip-flop ........................ 58 edge-triggered ..................... 54 excitation controls ................ 61 JK flip-flop ....................... 61 Master-Slave ....................... 54 T flip-flop ........................ 60 Floating point numbers ............... 96 addition/subtraction .......... 97, 100 algorithm for addition/subtraction 125 division .......................... 100 guard bits ......................... 99 multiplication .................... 100 multiplication/division ............ 96 normalization ...................... 96 normalized form .................... 98 rounding strategies ................ 99 UNF RTL for addition/subtraction .. 126 Full adder ........................... 39 Full subtractor ...................... 41 Generalized DeMorgan property ........ 20

Gigabyte............................. 14 Glitch............................... 79 Glitches and hazards................. 78 GND................................... 6 Gray Code............................ 15 Gray to binary decoder............... 35 Ground................................ 6 Guard bits........................... 99 Half adder........................... 39 Half subtractor...................... 40 Hamming code......................... 93 Hazard............................... 79 Hertz................................ 88 Hexadecimal........................... 7 Horizontal microcode................ 149 I/O buffer.......................... 129 IC.................................. 130 Idempotent property.................. 18 IEEE 754 floating point representation ................................... 97 IEEE 754 Floating Point Standard..... 98 IEEE floating point standard biased exponent.................... 98 exponent all 0's................... 98 implied leading 1.................. 98 Immediate addressing........... 139, 141 Immediate value..................... 139 Implicant............................ 30 Implicate............................ 30 Implied leading 1.................... 98 Index register................. 130, 140 Indirect addressing............ 139, 141 Instruction Counter................. 130 Instruction fetch.............. 129, 133 Instruction register................ 129 Integer arithmetic Booth's method.................... 114 non-restoring division............ 121 restoring division................ 117 UNF RTL for Booth's method........ 116 UNF RTL for non-restoring division 123 UNF RTL for restoring division.... 119 UNF RTL for signed multiply....... 113 Integer multiplication Booth's method.................... 114 signed multiply................... 112 Integers.............................. 6 1's complement representation...... 11 2421 BCD representation............ 14 2's complement representation...... 10 9's complement representation...... 14 base representation................. 6 excess-3 BCD....................... 14 hexadecimal......................... 7 octal............................... 7 self-complementing representation.. 14 sign-magnitude representation....... 8 Interrupt device............................ 151 mask.............................. 151 timer............................. 151 Inverting microcode................. 148 Involution property.................. 18 IR.................................. 129 JK flip-flop......................... 61 Johnson counter...................... 76

Page 158 Joule’s Law .......................... 57 Jump instruction .................... 140 Jump table .......................... 140 Karnaugh maps ........................ 25 K-byte ............................... 14 K-maps ............................... 25 distinguished cell ................. 31 don't care cell .................... 31 essential prime implicant .......... 31 general procedure .................. 31 implicant .......................... 30 implicate .......................... 30 prime implicant .................... 31 Latch ................................ 55 Latches .............................. 50 D-latch ............................ 58 SR-latch ........................... 51 Leading edge ......................... 54 Logic functions ....................... 1 composite functions ................. 5 truth table representation .......... 5 Logic gates ........................ 2, 4 ANSI symbols ........................ 4 Logic signals ......................... 6 False ............................... 6 high ................................ 6 low ................................. 6 True ................................ 6 Machine language .................... 137 Machine language instruction ........ 132 Machine language instructions ....... 137 MAR ................................. 129 Mask ................................ 151 Master-Slave flip-flop ............... 54 Maxterm .............................. 22 MDR ................................. 129 Mealy circuit ........................ 66 Megabyte ............................. 14 Megaflop ............................. 88 Memory .......................... 83, 128 CD-ROM ............................. 86 dynamic RAM ....................... 155 FPGA ............................... 86 PLA ................................ 85 RAM ................................ 85 ROM ................................ 85 static RAM ........................ 155 word size .......................... 86 Memory address register ............. 129 Memory address space ................ 132 Memory data register ................ 129 Microbranch ......................... 137 Microcode ........................... 132 branching ......................... 136 control store ..................... 137 End signal ........................ 134 horizontal ........................ 149 horizontal microcode .............. 132 instruction fetch ................. 133 inverting to obtain a circuit ..... 148 microbranch ....................... 137 vertical .......................... 149 Microcode programming ............... 137 Microprogrammable machine ........... 137 Microprograms ....................... 134 Milliamp ............................. 57

Minterm.............................. 22 Moore and Mealy circuits............. 72 Moore circuit........................ 66 Multiplexers......................... 44 used to implement a logic function. 45 Multiplier........................... 40 NAND conversions..................... 23 Negative logic........................ 6 Next state equation.................. 53 Next state function.................. 66 NiCad battery........................ 57 Non-restoring division.............. 121 UNF RTL program................... 123 NOR conversions...................... 23 Normalization........................ 96 Normalized form...................... 98 n-stage counter...................... 75 Numeric data.......................... 6 integers............................ 6 real numbers........................ 6 Octal................................. 7 Odd parity........................... 66 Ohm’s Law............................ 56 Ohms................................. 56 Operating system.................... 150 OR-AND conversions to NAND-AND........................ 24 to NAND-NAND....................... 24 to NOR-NOR......................... 24 Parity bit........................... 66 even parity........................ 66 odd parity......................... 66 Peripheral devices.................. 149 Peripherals......................... 128 Picosecond........................... 88 PIO................................. 155 Prime implicant...................... 31 Programmable logic arrays............ 86 Propagational delay.................. 78 Pull-up resistor..................... 55 Quine-McCluskey procedure............ 48 RAM memory........................... 85 Real numbers......................... 13 addition and subtraction........... 97 fixed point representation......... 13 floating point numbers............. 96 guard bits......................... 99 IEEE 754 Floating Point Standard... 98 multiplication and division........ 96 normalization...................... 96 normalized......................... 98 rounding strategies................ 99 Register transfer architecture...... 101 Register transfer language.......... 102 Register transfer logic............. 101 Register-Bus gating................. 131 Registers............................ 64 Residue classes....................... 9 Restoring division.................. 117 UNF RTL program................... 119 ROM memory........................... 85 Rounding............................. 99 RTL................................. 102 implementing control logic........ 104 implementing transfer logic....... 105 UNF RTL........................... 106

Page 159 Self-complementing representation .... 14 Self-starting counter ................ 74 Sequential binary adder .............. 67 Sequential circuit design ............ 66 Sequential circuit design process .... 68 Sequential circuits analysis ........................... 72 Set-Reset latch ...................... 51 Setup time ........................... 50 Shift-register counter ............... 75 SIC machine ......................... 142 single bus CPU organization ....... 143 Signed multiply algorithm ......................... 112 architecture ...................... 112 UNF RTL program ................... 113 Single pole, double throw switch ..... 55 Single precision floating point ...... 98 SN7447 BCD to 7-segment display decoder/driver ................... 36 SR – status register ................ 130 SRAM ................................ 155 SR-latch ............................. 51 Standard resistor values ............. 56 State diagram .................... 50, 66 Status register ..................... 130 Subtractor ........................... 40 4-bit parallel subtractor .......... 41 full subtractor .................... 41 half subtractor .................... 40 supervisor call ..................... 150 supervisor program .................. 150 SVC ................................. 150 Switch-tail counter .................. 76 T flip-flop .......................... 60 Terabyte ............................. 14 Timer interrupt ..................... 151 Trailing edge ........................ 54 Unary operations ...................... 2 complement ....................... 2, 4 identity ............................ 2 UNF RTL ............................. 106 arithmetic compare AEQ, ANEQ ....................... 109 AGT, AGE ........................ 109 ALT, ALE ........................ 109 assignment statement .............. 107 basic structure ................... 106 Boolean logic operations AND ............................. 109

COINC ........................... 109 NAND ............................ 109 NOR ............................. 109 NOT ............................. 109 OR .............................. 109 XOR ............................. 109 conditional branch................ 107 conditional execution............. 107 DECODE, ENCODE.................... 109 decrement by 1 DECREMENT ....................... 109 description....................... 106 dyadic operators.................. 109 expressions....................... 107 increment by 1 INCREMENT ....................... 109 labels............................ 107 logical and arithmetic shifts LASHIFT ......................... 109 LLSHIFT ......................... 109 LROTATE ......................... 109 RASHIFT ......................... 109 RLSHIFT ......................... 109 RROTATE ......................... 109 logical compare LEQ, LNEQ ....................... 109 LGT, LGTE ....................... 109 LLT, LLTE ....................... 109 merge............................. 107 monadic operators................. 109 naming registers and buses........ 106 reformat of user input decTOtwo, hexTOtwo .............. 109 register transfer................. 107 string manipulation FIRST, LAST ..................... 109 two's complement arithmetic ADD ............................. 109 DIV ............................. 109 MUL ............................. 109 SUB ............................. 109 twosCMPL.......................... 109 twoTOdec, twoTOhex................ 109 ZERO.............................. 109 Vertical microcode.................. 149 Voltage.............................. 56 Von Neumann architecture............ 129 Watt hours........................... 57 Word size............................ 86 Z80................................. 152