4 Notation conventions
After completing this chapter, you will be able to:
- Read and write vectors, matrices, and scalars using standard notation
- Interpret subscript and superscript conventions for indexing
- Recognize common mathematical symbols used throughout the book
- Navigate between mathematical notation and code implementations
Throughout this book, we use the following conventions:
- Scalars: Lowercase italic letters (\(x, y, n, \alpha, \beta\))
- Vectors: Lowercase bold letters (\(\mathbf{x}, \mathbf{v}, \mathbf{w}\)), assumed to be column vectors unless transposed
- Matrices: Uppercase bold letters (\(\mathbf{A}, \mathbf{W}, \mathbf{X}\))
- Sets: Uppercase calligraphic letters (\(\mathcal{D}, \mathcal{V}, \mathcal{X}\))
- Random variables: Uppercase italic letters (\(X, Y, Z\))
We denote element \(i\) of vector \(\mathbf{v}\) as \(v_i\) or \([\mathbf{v}]_i\). Element \((i,j)\) of matrix \(\mathbf{A}\) is \(a_{ij}\) or \([\mathbf{A}]_{ij}\). We write \(\mathbf{a}_i\) for the \(i\)-th column of \(\mathbf{A}\) and \(\mathbf{a}_{i:}\) for the \(i\)-th row (as a row vector).
4.1 Common sets
- \(\mathbb{R}\): real numbers
- \(\mathbb{R}^n\): \(n\)-dimensional real vectors
- \(\mathbb{R}^{m \times n}\): \(m \times n\) real matrices
- \(\mathbb{N}\): natural numbers \(\{0, 1, 2, \ldots\}\)
- \(\mathbb{Z}\): integers
4.2 Functions and operators
- \(\|\mathbf{v}\|\): Euclidean norm (default is \(L^2\))
- \(\|\mathbf{v}\|_p\): \(L^p\) norm
- \(\langle \mathbf{u}, \mathbf{v} \rangle\) or \(\mathbf{u} \cdot \mathbf{v}\) or \(\mathbf{u}^T\mathbf{v}\): dot product
- \(\nabla f\): gradient
- \(\mathbb{E}[X]\): expectation
- \(\text{Var}(X)\): variance
- \(\text{softmax}(\mathbf{z})\): softmax function
- \(\log x\): natural logarithm (base \(e\))
- \(\log_2 x\): logarithm base 2
4.3 Asymptotic notation
- \(f(n) = O(g(n))\): \(f\) grows no faster than \(g\)
- \(f(n) = \Omega(g(n))\): \(f\) grows at least as fast as \(g\)
- \(f(n) = \Theta(g(n))\): \(f\) and \(g\) grow at the same rate
4.4 Code conventions
In code examples, we use Python with NumPy and PyTorch. Matrix dimensions are often shown as comments, e.g., (batch_size, seq_len, d_model).
This completes our review of prerequisites. We’ve covered the essential linear algebra, calculus, and probability needed to understand transformers deeply. In the next chapter, we’ll apply these tools to build the foundations of neural networks.