Batch Gradient Descent is an optimization algorithm used to minimize the cost function in machine learning models by updating the model parameters using the entire dataset at every iteration. While it ensures convergence to the global minimum for convex functions, it can be computationally expensive and slow for large datasets due to the need to process the entire dataset at once.