Stochastic Gradient Descent (SGD) is an optimization algorithm used to minimize the loss function in machine learning models by iteratively updating the model parameters using a subset of the training data. It is particularly effective for large datasets and online learning due to its ability to escape local minima and its efficiency in terms of computation and memory usage.