Reinforcement learning
Official Definition
A machine learning approach where an agent learns to make sequences of decisions by taking actions in an environment and receiving feedback in the form of rewards or penalties.
Source: AIEOG AI Lexicon (Feb 2026), adapted from NIST AI 100-1
What reinforcement learning means in plain language
Reinforcement learning (RL) is a type of machine learning where an AI agent learns by trial and error. Unlike supervised learning (where the model learns from labeled examples), the RL agent takes actions in an environment, receives feedback (rewards for good outcomes, penalties for bad ones), and gradually learns which actions lead to the best results.
The most famous example is AlphaGo, which learned to play Go by playing millions of games against itself and receiving a reward signal for winning. In financial services, reinforcement learning is applied in algorithmic trading, dynamic pricing, portfolio optimization, and resource allocation.
Reinforcement learning from human feedback (RLHF) is a variation used to fine-tune large language models. Human evaluators rate model outputs, and the model learns to produce outputs that align with human preferences. This is how many commercial LLMs are trained to be helpful, harmless, and honest.
Why it matters in financial services
Reinforcement learning presents unique governance challenges compared to supervised learning:
- Exploration risk. RL agents learn by trying different actions, including potentially harmful ones. In financial services, exploration must be constrained to prevent regulatory violations or customer harm.
- Delayed consequences. RL optimizes for long-term rewards, but the consequences of individual actions may not be apparent until much later. This complicates monitoring and attribution.
- Reward design. The behavior of an RL agent is determined by its reward function. A poorly designed reward function can lead to unintended and potentially harmful optimization.
- Opacity. RL policies (the learned decision rules) can be difficult to interpret and explain, creating challenges for compliance.
Key considerations for compliance teams
- Constrain exploration. Ensure RL agents operate within defined safety boundaries and cannot take actions that violate regulations or policies.
- Audit reward design. Review and document the reward function to ensure it aligns with institutional objectives and compliance requirements.
- Monitor behavior patterns. Track the actions RL agents take over time to detect unexpected or concerning behavior.
- Establish human override. Maintain the ability to override or shut down RL agents when their behavior is unacceptable.
- Validate in simulation. Test RL agents extensively in simulated environments before production deployment.
- Document the learning process. Record training parameters, reward functions, and behavioral constraints for each RL deployment.
Related terms
Stay current on AI risk in financial services
Get practical guidance on AI governance, model risk, and regulatory developments delivered to your inbox. Stay up to date on the latest in financial compliance from our experts.
