Class QLearning

QLearning learning algorithm.

Inheritance

System.Object

QLearning

Inherited Members

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.ToString()

Namespace: Mars.Components.Services.Learning

Assembly: Mars.Components.dll

Syntax

[Serializable]
public class QLearning

Remarks

The class provides implementation of Q-Learning algorithm, known as off-policy Temporal Difference control.

Constructors

| Improve this Doc View Source

QLearning(Int32, Int32, IExplorationPolicy, Boolean)

Initializes a new instance of the QLearning class.

Declaration

public QLearning(int states, int actions, IExplorationPolicy explorationPolicy, bool randomize)

Parameters

Type	Name	Description
System.Int32	states	Amount of possible states.
System.Int32	actions	Amount of possible actions.
IExplorationPolicy	explorationPolicy	Exploration policy.
System.Boolean	randomize	Randomize action estimates or not.

Remarks

The randomize parameter specifies if initial action estimates should be randomized with small values or not. Randomization of action values may be useful, when greedy exploration policies are used. In this case randomization ensures that actions of the same type are not chosen always.

QLearning(Int32, Int32, IExplorationPolicy)

Initializes a new instance of the QLearning class.

Declaration

public QLearning(int states, int actions, IExplorationPolicy explorationPolicy)

Parameters

Type	Name	Description
System.Int32	states	Amount of possible states.
System.Int32	actions	Amount of possible actions.
IExplorationPolicy	explorationPolicy	Exploration policy.

Remarks

Action estimates are randomized in the case of this constructor is used.

Properties

| Improve this Doc View Source

ActionsCount

Amount of possible actions.

Declaration

public int ActionsCount { get; }

Property Value

Type	Description
System.Int32

Remarks

The class provides implementation of Q-Learning algorithm, known as off-policy Temporal Difference control.

DiscountFactor

Discount factor, [0, 1].

Declaration

public double DiscountFactor { get; set; }

Property Value

Type	Description
System.Double

Remarks

Discount factor for the expected summary reward. The value serves as multiplier for the expected reward. So if the value is set to 1, then the expected summary reward is not discounted. If the value is getting smaller, then smaller amount of the expected reward is used for actions' estimates update.

ExplorationPolicy

Exploration policy.

Declaration

public IExplorationPolicy ExplorationPolicy { get; set; }

Property Value

Type	Description
IExplorationPolicy

Remarks

Policy, which is used to select actions.

LearningRate

Learning rate, [0, 1].

Declaration

public double LearningRate { get; set; }

Property Value

Type	Description
System.Double

Remarks

The value determines the amount of updates Q-function receives during learning. The greater the value, the more updates the function receives. The lower the value, the less updates it receives.

StatesCount

Amount of possible states.

Declaration

public int StatesCount { get; }

Property Value

Type	Description
System.Int32

Remarks

The class provides implementation of Q-Learning algorithm, known as off-policy Temporal Difference control.

Methods

| Improve this Doc View Source

GetAction(Int32)

Get next action from the specified state.

Declaration

public int GetAction(int state)

Parameters

Type	Name	Description
System.Int32	state	Current state to get an action for.

Returns

Type	Description
System.Int32	Returns the action for the state.

Remarks

The method returns an action according to current exploration policy.

UpdateState(Int32, Int32, Double, Int32)

Update Q-function's value for the previous state-action pair.

Declaration

public void UpdateState(int previousState, int action, double reward, int nextState)

Parameters

Type	Name	Description
System.Int32	previousState	Previous state.
System.Int32	action	Action, which leads from previous to the next state.
System.Double	reward	Reward value, received by taking specified action from previous state.
System.Int32	nextState	Next state.

Remarks

The class provides implementation of Q-Learning algorithm, known as off-policy Temporal Difference control.

Extension Methods

Serializer.Save<T>(T, out Byte[], SerializerCompression)

Serializer.Save<T>(T, Stream, SerializerCompression)

Serializer.Save<T>(T, BinaryFormatter, Stream, SerializerCompression)

Serializer.Save<T>(T, String, SerializerCompression)

Serializer.Save<T>(T, String)

Matrix.Concatenate<T>(T, T[])

Matrix.Replace<T>(T, Object, Object)

DomainDataImporter.Import(Object, InputConfiguration)

ObjectSerialize.Serialize(Object)

Matrix.IsEqual(Object, Object, Decimal, Decimal)

Class QLearning

Inheritance

Inherited Members

Namespace: Mars.Components.Services.Learning

Assembly: Mars.Components.dll

Syntax

Remarks

Constructors

QLearning(Int32, Int32, IExplorationPolicy, Boolean)

Declaration

Parameters

Remarks

See Also

QLearning(Int32, Int32, IExplorationPolicy)

Declaration

Parameters

Remarks

See Also

Properties

ActionsCount

Declaration

Property Value

Remarks

See Also

DiscountFactor

Declaration

Property Value

Remarks

See Also

ExplorationPolicy

Declaration

Property Value

Remarks

See Also

LearningRate

Declaration

Property Value

Remarks

See Also

StatesCount

Declaration

Property Value

Remarks

See Also

Methods

GetAction(Int32)

Declaration

Parameters

Returns

Remarks

See Also

UpdateState(Int32, Int32, Double, Int32)

Declaration

Parameters

Remarks

See Also

Extension Methods

See Also