4 Notation conventions

Learning objectives

After completing this chapter, you will be able to:

Read and write vectors, matrices, and scalars using standard notation
Interpret subscript and superscript conventions for indexing
Recognize common mathematical symbols used throughout the book
Navigate between mathematical notation and code implementations

Throughout this book, we use the following conventions:

Scalars: Lowercase italic letters (\(x, y, n, \alpha, \beta\))
Vectors: Lowercase bold letters (\(\mathbf{x}, \mathbf{v}, \mathbf{w}\)), assumed to be column vectors unless transposed
Matrices: Uppercase bold letters (\(\mathbf{A}, \mathbf{W}, \mathbf{X}\))
Sets: Uppercase calligraphic letters (\(\mathcal{D}, \mathcal{V}, \mathcal{X}\))
Random variables: Uppercase italic letters (\(X, Y, Z\))

We denote element \(i\) of vector \(\mathbf{v}\) as \(v_i\) or \([\mathbf{v}]_i\). Element \((i,j)\) of matrix \(\mathbf{A}\) is \(a_{ij}\) or \([\mathbf{A}]_{ij}\). We write \(\mathbf{a}_i\) for the \(i\)-th column of \(\mathbf{A}\) and \(\mathbf{a}_{i:}\) for the \(i\)-th row (as a row vector).

4.1 Common sets

\(\mathbb{R}\): real numbers
\(\mathbb{R}^n\): \(n\)-dimensional real vectors
\(\mathbb{R}^{m \times n}\): \(m \times n\) real matrices
\(\mathbb{N}\): natural numbers \(\{0, 1, 2, \ldots\}\)
\(\mathbb{Z}\): integers

4.2 Functions and operators

\(\|\mathbf{v}\|\): Euclidean norm (default is \(L^2\))
\(\|\mathbf{v}\|_p\): \(L^p\) norm
\(\langle \mathbf{u}, \mathbf{v} \rangle\) or \(\mathbf{u} \cdot \mathbf{v}\) or \(\mathbf{u}^T\mathbf{v}\): dot product
\(\nabla f\): gradient
\(\mathbb{E}[X]\): expectation
\(\text{Var}(X)\): variance
\(\text{softmax}(\mathbf{z})\): softmax function
\(\log x\): natural logarithm (base \(e\))
\(\log_2 x\): logarithm base 2

4.3 Asymptotic notation

\(f(n) = O(g(n))\): \(f\) grows no faster than \(g\)
\(f(n) = \Omega(g(n))\): \(f\) grows at least as fast as \(g\)
\(f(n) = \Theta(g(n))\): \(f\) and \(g\) grow at the same rate

4.4 Code conventions

In code examples, we use Python with NumPy and PyTorch. Matrix dimensions are often shown as comments, e.g., (batch_size, seq_len, d_model).

This completes our review of prerequisites. We’ve covered the essential linear algebra, calculus, and probability needed to understand transformers deeply. In the next chapter, we’ll apply these tools to build the foundations of neural networks.