Reinforcement Learning is one of the most popular and interesting fields of Machine Learning. In Reinforcement Learning, a software agent makes observations and takes actions in any situation and in return it receives rewards.The objective of the agent is to learn to act in such a way that its rewards are maximized.
There are two ways in which you can make your agent learn to make decisions:
- Policy Learning
The algorithm used by the software agent to determine its actions is called its “Policy”.The Policy can be any algorithm you can think of.Policy Learning can be thought of as a set of directions that will tell your agent what to do.For example, a robotic vacuum cleaner whose reward is to pick all the dust, its policy can be to move forward. If you think of a policy as a function, it only has one input: the state. But knowing in advance what your policy should be isn’t easy, and requires deep knowledge of the complex function that maps state to goal.
Another way of making your agent learn is not by explicitly telling it what to do,rather giving it a framework to make its own decisions. Unlike policy learning, Q-Learning considers two inputs state and action.Q-learning will tell you the expected value of each action your agent could take.
One of the peculiar feature of Q-Learning is that it doesn’t just estimate the immediate value of taking an action in a given state but it also adds in all the future values that could be Possible. For readers familiar with corporate finance, Q-Learning is sort of like a discounted cash flow analysis – it takes all potential future value into account when determining the current value of an action (or asset). In fact, Q-Learning even uses a discount-factor to model the fact that rewards in the future are worth less than rewards now.