Data bias occurs when the data used to train or evaluate models is not representative of the real-world context, leading to skewed or unfair outcomes. This can result from various sources, including sampling errors, historical prejudices, or algorithmic design flaws, and can perpetuate or exacerbate existing inequalities.