Class Sarsa

Sarsa learning algorithm.

Inheritance

System.Object

Sarsa

Inherited Members

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.ToString()

Namespace: Mars.Components.Services.Learning

Assembly: Mars.Components.dll

Syntax

[Serializable]
public class Sarsa

Remarks

The class provides implementation of Sarsa algorithm, known as on-policy Temporal Difference control.

Examples

The following example shows how to learn a model using reinforcement learning through the Sarsa algorithm.

Constructors

| Improve this Doc View Source

Sarsa(Int32, Int32, IExplorationPolicy, Boolean)

Initializes a new instance of the Sarsa class.

Declaration

public Sarsa(int states, int actions, IExplorationPolicy explorationPolicy, bool randomize = true)

Parameters

Type	Name	Description
System.Int32	states	Amount of possible states.
System.Int32	actions	Amount of possible actions.
IExplorationPolicy	explorationPolicy	Exploration policy.
System.Boolean	randomize	Randomize action estimates or not.

Remarks

The randomize parameter specifies if initial action estimates should be randomized with small values or not. Randomization of action values may be useful, when greedy exploration policies are used. In this case randomization ensures that actions of the same type are not chosen always.

Properties

| Improve this Doc View Source

ActionsCount

Amount of possible actions.

Declaration

public int ActionsCount { get; }

Property Value

Type	Description
System.Int32

Remarks

The class provides implementation of Sarsa algorithm, known as on-policy Temporal Difference control.

DiscountFactor

Discount factor, [0, 1].

Declaration

public double DiscountFactor { get; set; }

Property Value

Type	Description
System.Double

Remarks

Discount factor for the expected summary reward. The value serves as multiplier for the expected reward. So if the value is set to 1, then the expected summary reward is not discounted. If the value is getting smaller, then smaller amount of the expected reward is used for actions' estimates update.

ExplorationPolicy

Exploration policy.

Declaration

public IExplorationPolicy ExplorationPolicy { get; set; }

Property Value

Type	Description
IExplorationPolicy

Remarks

Policy, which is used to select actions.

LearningRate

Learning rate, [0, 1].

Declaration

public double LearningRate { get; set; }

Property Value

Type	Description
System.Double

Remarks

The value determines the amount of updates Q-function receives during learning. The greater the value, the more updates the function receives. The lower the value, the less updates it receives.

StatesCount

Amount of possible states.

Declaration

public int StatesCount { get; }

Property Value

Type	Description
System.Int32

Remarks

The class provides implementation of Sarsa algorithm, known as on-policy Temporal Difference control.

Methods

| Improve this Doc View Source

GetAction(Int32)

Get next action from the specified state.

Declaration

public int GetAction(int state)

Parameters

Type	Name	Description
System.Int32	state	Current state to get an action for.

Returns

Type	Description
System.Int32	Returns the action for the state.

Remarks

The method returns an action according to current exploration policy.

UpdateState(Int32, Int32, Double, Int32, Int32)

Update Q-function's value for the previous state-action pair.

Declaration

public void UpdateState(int previousState, int previousAction, double reward, int nextState, int nextAction)

Parameters

Type	Name	Description
System.Int32	previousState	Curren state.
System.Int32	previousAction	Action, which lead from previous to the next state.
System.Double	reward	Reward value, received by taking specified action from previous state.
System.Int32	nextState	Next state.
System.Int32	nextAction	Next action.

Remarks

Updates Q-function's value for the previous state-action pair in the case if the next state is non terminal.

UpdateState(Int32, Int32, Double)

Update Q-function's value for the previous state-action pair.

Declaration

public void UpdateState(int previousState, int previousAction, double reward)

Parameters

Type	Name	Description
System.Int32	previousState	Curren state.
System.Int32	previousAction	Action, which lead from previous to the next state.
System.Double	reward	Reward value, received by taking specified action from previous state.

Remarks

Updates Q-function's value for the previous state-action pair in the case if the next state is terminal.

Extension Methods

Serializer.Save<T>(T, out Byte[], SerializerCompression)

Serializer.Save<T>(T, Stream, SerializerCompression)

Serializer.Save<T>(T, BinaryFormatter, Stream, SerializerCompression)

Serializer.Save<T>(T, String, SerializerCompression)

Serializer.Save<T>(T, String)

Matrix.Concatenate<T>(T, T[])

Matrix.Replace<T>(T, Object, Object)

DomainDataImporter.Import(Object, InputConfiguration)

ObjectSerialize.Serialize(Object)

Matrix.IsEqual(Object, Object, Decimal, Decimal)

Class Sarsa

Inheritance

Inherited Members

Namespace: Mars.Components.Services.Learning

Assembly: Mars.Components.dll

Syntax

Remarks

Examples

Constructors

Sarsa(Int32, Int32, IExplorationPolicy, Boolean)

Declaration

Parameters

Remarks

See Also

Properties

ActionsCount

Declaration

Property Value

Remarks

See Also

DiscountFactor

Declaration

Property Value

Remarks

See Also

ExplorationPolicy

Declaration

Property Value

Remarks

See Also

LearningRate

Declaration

Property Value

Remarks

See Also

StatesCount

Declaration

Property Value

Remarks

See Also

Methods

GetAction(Int32)

Declaration

Parameters

Returns

Remarks

See Also

UpdateState(Int32, Int32, Double, Int32, Int32)

Declaration

Parameters

Remarks

See Also

UpdateState(Int32, Int32, Double)

Declaration

Parameters

Remarks

See Also

Extension Methods

See Also