Class InfiniteQLearning
QLearning learning algorithm with infinite number of states.
Inheritance
System.Object
InfiniteQLearning
Inherited Members
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.ToString()
Namespace: Mars.Components.Services.Learning
Assembly: Mars.Components.dll
Syntax
[Serializable]
public class InfiniteQLearning
Remarks
The class provides implementation of Q-Learning algorithm, known as
off-policy Temporal Difference control.
Constructors
| Improve this Doc View SourceInfiniteQLearning(Int32, Int32, IExplorationPolicy)
Initializes a new instance of the InfiniteQLearning class.
Declaration
public InfiniteQLearning(int states, int actions, IExplorationPolicy explorationPolicy)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | states | Amount of possible states. |
System.Int32 | actions | Amount of possible actions. |
IExplorationPolicy | explorationPolicy | Exploration policy. |
Remarks
The randomize parameter specifies if initial action estimates should be randomized
with small values or not. Randomization of action values may be useful, when greedy exploration
policies are used. In this case randomization ensures that actions of the same type are not chosen always.
See Also
Properties
| Improve this Doc View SourceActionsCount
Amount of possible actions.
Declaration
public int ActionsCount { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Remarks
The class provides implementation of Q-Learning algorithm, known as
off-policy Temporal Difference control.
See Also
| Improve this Doc View SourceDiscountFactor
Discount factor, [0, 1].
Declaration
public double DiscountFactor { get; set; }
Property Value
Type | Description |
---|---|
System.Double |
Remarks
Discount factor for the expected summary reward. The value serves as
multiplier for the expected reward. So if the value is set to 1,
then the expected summary reward is not discounted. If the value is getting
smaller, then smaller amount of the expected reward is used for actions'
estimates update.
See Also
| Improve this Doc View SourceExplorationPolicy
Exploration policy.
Declaration
public IExplorationPolicy ExplorationPolicy { get; set; }
Property Value
Type | Description |
---|---|
IExplorationPolicy |
Remarks
Policy, which is used to select actions.
See Also
| Improve this Doc View SourceLearningRate
Learning rate, [0, 1].
Declaration
public double LearningRate { get; set; }
Property Value
Type | Description |
---|---|
System.Double |
Remarks
The value determines the amount of updates Q-function receives
during learning. The greater the value, the more updates the function receives.
The lower the value, the less updates it receives.
See Also
| Improve this Doc View SourceStatesCount
Amount of possible states.
Declaration
public BigInteger StatesCount { get; }
Property Value
Type | Description |
---|---|
System.Numerics.BigInteger |
Remarks
The class provides implementation of Q-Learning algorithm, known as
off-policy Temporal Difference control.
See Also
| Improve this Doc View SourceTriedStatesCount
Gets the number of states that have already been explored by the algorithm.
Declaration
public int TriedStatesCount { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Remarks
The class provides implementation of Q-Learning algorithm, known as
off-policy Temporal Difference control.
See Also
Methods
| Improve this Doc View SourceGetAction(Int32)
Get next action from the specified state.
Declaration
public int GetAction(int state)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | state | Current state to get an action for. |
Returns
Type | Description |
---|---|
System.Int32 | Returns the action for the state. |
Remarks
The method returns an action according to current
exploration policy.
See Also
| Improve this Doc View SourceUpdateState(Int32, Int32, Double, Int32)
Update Q-function's value for the previous state-action pair.
Declaration
public void UpdateState(int previousState, int action, double reward, int nextState)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | previousState | Previous state. |
System.Int32 | action | Action, which leads from previous to the next state. |
System.Double | reward | Reward value, received by taking specified action from previous state. |
System.Int32 | nextState | Next state. |
Remarks
The class provides implementation of Q-Learning algorithm, known as
off-policy Temporal Difference control.