The math optional, made finite. Daily Practice

Representation of Integers, Signed Integers, and Reals (incl. Double Precision)

At a Glance

Why This Chapter Matters

A single 5-mark question from 2024 covers the full spectrum of number representation — unsigned integers, signed integers (2’s complement), and IEEE 754 double-precision floating-point. The marks are quick if the double-precision bit layout is memorised and the bias-1023 formula is applied correctly. This atom is a reliable minimal-effort maximum-marks target.

Minimum Theory

Unsigned Integers

An nn-bit unsigned integer stores values from 00 to 2n12^n - 1. The value is:

V=k=0n1bk2kV = \sum_{k=0}^{n-1} b_k \cdot 2^k

where bkb_k is the kk-th bit (LSB = b0b_0).

Signed Integers: Three Schemes

SchemePositive NNNegative N-NRange (nn bits)
Sign-magnitude0N0\,\|N\|1N1\,\|N\|(2n11)-(2^{n-1}-1) to 2n112^{n-1}-1; two zeros
1’s complementNNN\overline{N} (bitwise NOT)(2n11)-(2^{n-1}-1) to 2n112^{n-1}-1; two zeros
2’s complementNNN+1\overline{N}+12n1-2^{n-1} to 2n112^{n-1}-1; one zero

2’s complement is universal in modern hardware. Its key advantage: ordinary binary addition works for both positive and negative numbers without special cases.

Detecting overflow in 2’s complement addition. Overflow occurs if and only if two numbers of the same sign are added and the result has the opposite sign.

Floating-Point: IEEE 754 Double Precision

Bit layout (64 bits total):

s1  e10e011  m51m052\underbrace{s}_{1}\;\underbrace{e_{10}\cdots e_0}_{11}\;\underbrace{m_{51}\cdots m_0}_{52}

Value of a normalised number (1E20461 \le E \le 2046):

x=(1)s×1.m×2E1023x = (-1)^s \times 1.m \times 2^{E - 1023}

where 1.m1.m means 1+k=152mk2k1 + \sum_{k=1}^{52} m_k \cdot 2^{-k}.

Special values:

EE (stored)mmMeaning
00±0\pm 0
00\ne 0Subnormal: (1)s×0.m×21022(-1)^s \times 0.m \times 2^{-1022}
20470±\pm \infty
20470\ne 0NaN

Machine epsilon. The smallest ε\varepsilon such that 1+ε11 + \varepsilon \ne 1 in double precision:

εmach=2522.22×1016\varepsilon_{\text{mach}} = 2^{-52} \approx 2.22 \times 10^{-16}

Converting a decimal to double precision — procedure:

  1. Determine the sign bit ss.
  2. Convert x|x| to binary.
  3. Normalise: write as 1.m×2e1.m \times 2^e (shift the binary point so that exactly one 1 is to the left).
  4. Biased exponent: E=e+1023E = e + 1023; convert EE to 11-bit binary.
  5. Mantissa: take the 52 bits after the binary point of 1.m1.m, padding with zeros on the right if needed.

Question Archetypes

ArchetypeRecognition
decimal-to-doubleRepresent a given decimal number in IEEE 754 double-precision format
interpret-bit-patternGiven a 64-bit pattern, decode the double-precision value
signed-range-or-2s-compState the range, or convert a negative number to 2’s complement

decimal-to-double (1 question; 2024)

Recognition Cues

Solution Template

  1. Write s=0s = 0 (positive) or s=1s = 1 (negative).
  2. Convert x|x| to binary using repeated multiplication (fractional part) or division (integer part).
  3. Normalise to 1.f×2e1.f \times 2^e.
  4. Compute biased exponent E=e+1023E = e + 1023; express as 11-bit binary.
  5. Write the 52 mantissa bits (the fractional part ff, padded to 52 bits).
  6. Assemble: s    E10E0    m51m0s\;|\;E_{10}\cdots E_0\;|\;m_{51}\cdots m_0.

Worked Example

2024 Paper 2, 2024-P2-Q8a (5 marks)

Represent the decimal number 13.625-13.625 in IEEE 754 double-precision (64-bit) floating-point format. Give the sign bit, biased exponent (in binary), and the first 10 bits of the mantissa.

Step 1 — sign bit.

x=13.625<0x = -13.625 < 0, so s=1s = 1.

Step 2 — convert x=13.625|x| = 13.625 to binary.

Integer part: 13=8+4+1=1101213 = 8+4+1 = 1101_2.

Fractional part: 0.625×2=1.250.625 \times 2 = 1.25 \to bit 1; 0.25×2=0.50.25 \times 2 = 0.5 \to bit 0; 0.5×2=1.00.5 \times 2 = 1.0 \to bit 1. Stop.

So 0.62510=0.10120.625_{10} = 0.101_2.

Therefore: 13.62510=1101.101213.625_{10} = 1101.101_2.

Step 3 — normalise.

1101.1012=1.101101×231101.101_2 = 1.101101 \times 2^3

Exponent e=3e = 3.

Step 4 — biased exponent.

E=3+1023=102610E = 3 + 1023 = 1026_{10}

Convert 10261026 to 11-bit binary:

1026=1024+2=210+21    1000000001021026 = 1024 + 2 = 2^{10} + 2^1 \implies 10000000010_2

Step 5 — mantissa (52 bits).

The fractional part of 1.1011011.101101 is 10110100046101101\underbrace{00\cdots0}_{46}. The first 10 mantissa bits are 10110100001011010000.

Step 6 — assemble.

1s  10000000010E,  11 bits  10110100000m,  52 bits\underbrace{1}_{s}\;\underbrace{10000000010}_{E,\;11\text{ bits}}\;\underbrace{1011010000\cdots0}_{m,\;52\text{ bits}}

s=1,E=100000000102,m=101101000000042\boxed{s=1,\quad E = 10000000010_2,\quad m = 1011010000\underbrace{00\cdots0}_{42}}

Common Traps

Marks-Aware Writing

At 5 marks, an efficient answer has five numbered steps: sign bit, binary conversion of x|x|, normalisation showing ee, biased exponent computation and conversion to 11-bit binary, and the mantissa bits. Every step must be shown — the examiner cannot award marks for a final bit pattern without the derivation. Stating the IEEE 754 field widths (1-11-52) in the opening line saves you from being penalised for the wrong layout.

Practice Set

Only one historical question on this atom (shown above).

We've mapped all 13 years of this exam. Get new chapters, tools, and solutions as we release them — free.