Class QLearning
QLearning learning algorithm.
Inheritance
System.Object
QLearning
Inherited Members
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.ToString()
Namespace: Mars.Components.Services.Learning
Assembly: Mars.Components.dll
Syntax
[Serializable]
public class QLearning
Remarks
The class provides implementation of Q-Learning algorithm, known as
off-policy Temporal Difference control.
Constructors
| Improve this Doc View SourceQLearning(Int32, Int32, IExplorationPolicy, Boolean)
Initializes a new instance of the QLearning class.
Declaration
public QLearning(int states, int actions, IExplorationPolicy explorationPolicy, bool randomize)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | states | Amount of possible states. |
System.Int32 | actions | Amount of possible actions. |
IExplorationPolicy | explorationPolicy | Exploration policy. |
System.Boolean | randomize | Randomize action estimates or not. |
Remarks
The randomize parameter specifies if initial action estimates should be randomized
with small values or not. Randomization of action values may be useful, when greedy exploration
policies are used. In this case randomization ensures that actions of the same type are not chosen always.
See Also
| Improve this Doc View SourceQLearning(Int32, Int32, IExplorationPolicy)
Initializes a new instance of the QLearning class.
Declaration
public QLearning(int states, int actions, IExplorationPolicy explorationPolicy)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | states | Amount of possible states. |
System.Int32 | actions | Amount of possible actions. |
IExplorationPolicy | explorationPolicy | Exploration policy. |
Remarks
Action estimates are randomized in the case of this constructor
is used.
See Also
Properties
| Improve this Doc View SourceActionsCount
Amount of possible actions.
Declaration
public int ActionsCount { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Remarks
The class provides implementation of Q-Learning algorithm, known as
off-policy Temporal Difference control.
See Also
| Improve this Doc View SourceDiscountFactor
Discount factor, [0, 1].
Declaration
public double DiscountFactor { get; set; }
Property Value
Type | Description |
---|---|
System.Double |
Remarks
Discount factor for the expected summary reward. The value serves as
multiplier for the expected reward. So if the value is set to 1,
then the expected summary reward is not discounted. If the value is getting
smaller, then smaller amount of the expected reward is used for actions'
estimates update.
See Also
| Improve this Doc View SourceExplorationPolicy
Exploration policy.
Declaration
public IExplorationPolicy ExplorationPolicy { get; set; }
Property Value
Type | Description |
---|---|
IExplorationPolicy |
Remarks
Policy, which is used to select actions.
See Also
| Improve this Doc View SourceLearningRate
Learning rate, [0, 1].
Declaration
public double LearningRate { get; set; }
Property Value
Type | Description |
---|---|
System.Double |
Remarks
The value determines the amount of updates Q-function receives
during learning. The greater the value, the more updates the function receives.
The lower the value, the less updates it receives.
See Also
| Improve this Doc View SourceStatesCount
Amount of possible states.
Declaration
public int StatesCount { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Remarks
The class provides implementation of Q-Learning algorithm, known as
off-policy Temporal Difference control.
See Also
Methods
| Improve this Doc View SourceGetAction(Int32)
Get next action from the specified state.
Declaration
public int GetAction(int state)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | state | Current state to get an action for. |
Returns
Type | Description |
---|---|
System.Int32 | Returns the action for the state. |
Remarks
The method returns an action according to current
exploration policy.
See Also
| Improve this Doc View SourceUpdateState(Int32, Int32, Double, Int32)
Update Q-function's value for the previous state-action pair.
Declaration
public void UpdateState(int previousState, int action, double reward, int nextState)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | previousState | Previous state. |
System.Int32 | action | Action, which leads from previous to the next state. |
System.Double | reward | Reward value, received by taking specified action from previous state. |
System.Int32 | nextState | Next state. |
Remarks
The class provides implementation of Q-Learning algorithm, known as
off-policy Temporal Difference control.