4  Notation conventions

NoteLearning objectives

After completing this chapter, you will be able to:

  • Read and write vectors, matrices, and scalars using standard notation
  • Interpret subscript and superscript conventions for indexing
  • Recognize common mathematical symbols used throughout the book
  • Navigate between mathematical notation and code implementations

Throughout this book, we use the following conventions:

We denote element \(i\) of vector \(\mathbf{v}\) as \(v_i\) or \([\mathbf{v}]_i\). Element \((i,j)\) of matrix \(\mathbf{A}\) is \(a_{ij}\) or \([\mathbf{A}]_{ij}\). We write \(\mathbf{a}_i\) for the \(i\)-th column of \(\mathbf{A}\) and \(\mathbf{a}_{i:}\) for the \(i\)-th row (as a row vector).

4.1 Common sets

  • \(\mathbb{R}\): real numbers
  • \(\mathbb{R}^n\): \(n\)-dimensional real vectors
  • \(\mathbb{R}^{m \times n}\): \(m \times n\) real matrices
  • \(\mathbb{N}\): natural numbers \(\{0, 1, 2, \ldots\}\)
  • \(\mathbb{Z}\): integers

4.2 Functions and operators

  • \(\|\mathbf{v}\|\): Euclidean norm (default is \(L^2\))
  • \(\|\mathbf{v}\|_p\): \(L^p\) norm
  • \(\langle \mathbf{u}, \mathbf{v} \rangle\) or \(\mathbf{u} \cdot \mathbf{v}\) or \(\mathbf{u}^T\mathbf{v}\): dot product
  • \(\nabla f\): gradient
  • \(\mathbb{E}[X]\): expectation
  • \(\text{Var}(X)\): variance
  • \(\text{softmax}(\mathbf{z})\): softmax function
  • \(\log x\): natural logarithm (base \(e\))
  • \(\log_2 x\): logarithm base 2

4.3 Asymptotic notation

  • \(f(n) = O(g(n))\): \(f\) grows no faster than \(g\)
  • \(f(n) = \Omega(g(n))\): \(f\) grows at least as fast as \(g\)
  • \(f(n) = \Theta(g(n))\): \(f\) and \(g\) grow at the same rate

4.4 Code conventions

In code examples, we use Python with NumPy and PyTorch. Matrix dimensions are often shown as comments, e.g., (batch_size, seq_len, d_model).


This completes our review of prerequisites. We’ve covered the essential linear algebra, calculus, and probability needed to understand transformers deeply. In the next chapter, we’ll apply these tools to build the foundations of neural networks.