Class Sarsa
Sarsa learning algorithm.
Inheritance
System.Object
Sarsa
Inherited Members
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.ToString()
Namespace: Mars.Components.Services.Learning
Assembly: Mars.Components.dll
Syntax
[Serializable]
public class Sarsa
Remarks
The class provides implementation of Sarsa algorithm, known as
on-policy Temporal Difference control.
Examples
The following example shows how to learn a model using reinforcement learning through the Sarsa algorithm.
Constructors
| Improve this Doc View SourceSarsa(Int32, Int32, IExplorationPolicy, Boolean)
Initializes a new instance of the Sarsa class.
Declaration
public Sarsa(int states, int actions, IExplorationPolicy explorationPolicy, bool randomize = true)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | states | Amount of possible states. |
System.Int32 | actions | Amount of possible actions. |
IExplorationPolicy | explorationPolicy | Exploration policy. |
System.Boolean | randomize | Randomize action estimates or not. |
Remarks
The randomize parameter specifies if initial action estimates should be randomized
with small values or not. Randomization of action values may be useful, when greedy exploration
policies are used. In this case randomization ensures that actions of the same type are not chosen always.
See Also
Properties
| Improve this Doc View SourceActionsCount
Amount of possible actions.
Declaration
public int ActionsCount { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Remarks
The class provides implementation of Sarsa algorithm, known as
on-policy Temporal Difference control.
See Also
| Improve this Doc View SourceDiscountFactor
Discount factor, [0, 1].
Declaration
public double DiscountFactor { get; set; }
Property Value
Type | Description |
---|---|
System.Double |
Remarks
Discount factor for the expected summary reward. The value serves as
multiplier for the expected reward. So if the value is set to 1,
then the expected summary reward is not discounted. If the value is getting
smaller, then smaller amount of the expected reward is used for actions'
estimates update.
See Also
| Improve this Doc View SourceExplorationPolicy
Exploration policy.
Declaration
public IExplorationPolicy ExplorationPolicy { get; set; }
Property Value
Type | Description |
---|---|
IExplorationPolicy |
Remarks
Policy, which is used to select actions.
See Also
| Improve this Doc View SourceLearningRate
Learning rate, [0, 1].
Declaration
public double LearningRate { get; set; }
Property Value
Type | Description |
---|---|
System.Double |
Remarks
The value determines the amount of updates Q-function receives
during learning. The greater the value, the more updates the function receives.
The lower the value, the less updates it receives.
See Also
| Improve this Doc View SourceStatesCount
Amount of possible states.
Declaration
public int StatesCount { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Remarks
The class provides implementation of Sarsa algorithm, known as
on-policy Temporal Difference control.
See Also
Methods
| Improve this Doc View SourceGetAction(Int32)
Get next action from the specified state.
Declaration
public int GetAction(int state)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | state | Current state to get an action for. |
Returns
Type | Description |
---|---|
System.Int32 | Returns the action for the state. |
Remarks
The method returns an action according to current
exploration policy.
See Also
| Improve this Doc View SourceUpdateState(Int32, Int32, Double, Int32, Int32)
Update Q-function's value for the previous state-action pair.
Declaration
public void UpdateState(int previousState, int previousAction, double reward, int nextState, int nextAction)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | previousState | Curren state. |
System.Int32 | previousAction | Action, which lead from previous to the next state. |
System.Double | reward | Reward value, received by taking specified action from previous state. |
System.Int32 | nextState | Next state. |
System.Int32 | nextAction | Next action. |
Remarks
Updates Q-function's value for the previous state-action pair in
the case if the next state is non terminal.
See Also
| Improve this Doc View SourceUpdateState(Int32, Int32, Double)
Update Q-function's value for the previous state-action pair.
Declaration
public void UpdateState(int previousState, int previousAction, double reward)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | previousState | Curren state. |
System.Int32 | previousAction | Action, which lead from previous to the next state. |
System.Double | reward | Reward value, received by taking specified action from previous state. |
Remarks
Updates Q-function's value for the previous state-action pair in
the case if the next state is terminal.