04 Chapter

Dimensionality Reduction & Representation Learning

Compress and visualize high-dimensional data while preserving structure.

Dimensionality reduction compresses high-dimensional data into fewer dimensions while preserving its important structure — essential for visualization, denoising, and feeding downstream models. The methods below span linear classics and modern nonlinear embeddings.

  • Use PCA first for simplicity.
  • Use UMAP or t-SNE for visualizing embeddings, clusters, and high-dimensional datasets.
#AlgorithmBest forCommon fields
1PCA Compression, visualization, noise reduction
  • Finance
  • biology
  • image processing
  • preprocessing
2t-SNE 2D/3D visualization
  • NLP embeddings
  • genomics
  • exploratory analysis
3UMAP Fast nonlinear visualization/embedding
  • Single-cell biology
  • text embeddings
  • image embeddings
4LDA: Linear Discriminant Analysis Supervised projection/classification
  • Face recognition
  • medical classification
5NMF: Non-negative Matrix Factorization Parts-based decomposition
  • Topic modeling
  • image decomposition
  • recommender systems
6Autoencoders Learned nonlinear embeddings
  • Anomaly detection
  • compression
  • denoising