Class InfiniteQLearning

QLearning learning algorithm with infinite number of states.

Inheritance

System.Object

InfiniteQLearning

Inherited Members

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.ToString()

Namespace: Mars.Components.Services.Learning

Assembly: Mars.Components.dll

Syntax

[Serializable]
public class InfiniteQLearning

Remarks

The class provides implementation of Q-Learning algorithm, known as off-policy Temporal Difference control.

Constructors

| Improve this Doc View Source

InfiniteQLearning(Int32, Int32, IExplorationPolicy)

Initializes a new instance of the InfiniteQLearning class.

Declaration

public InfiniteQLearning(int states, int actions, IExplorationPolicy explorationPolicy)

Parameters

Type	Name	Description
System.Int32	states	Amount of possible states.
System.Int32	actions	Amount of possible actions.
IExplorationPolicy	explorationPolicy	Exploration policy.

Remarks

The randomize parameter specifies if initial action estimates should be randomized with small values or not. Randomization of action values may be useful, when greedy exploration policies are used. In this case randomization ensures that actions of the same type are not chosen always.

Properties

| Improve this Doc View Source

ActionsCount

Amount of possible actions.

Declaration

public int ActionsCount { get; }

Property Value

Type	Description
System.Int32

Remarks

The class provides implementation of Q-Learning algorithm, known as off-policy Temporal Difference control.

DiscountFactor

Discount factor, [0, 1].

Declaration

public double DiscountFactor { get; set; }

Property Value

Type	Description
System.Double

Remarks

Discount factor for the expected summary reward. The value serves as multiplier for the expected reward. So if the value is set to 1, then the expected summary reward is not discounted. If the value is getting smaller, then smaller amount of the expected reward is used for actions' estimates update.

ExplorationPolicy

Exploration policy.

Declaration

public IExplorationPolicy ExplorationPolicy { get; set; }

Property Value

Type	Description
IExplorationPolicy

Remarks

Policy, which is used to select actions.

LearningRate

Learning rate, [0, 1].

Declaration

public double LearningRate { get; set; }

Property Value

Type	Description
System.Double

Remarks

The value determines the amount of updates Q-function receives during learning. The greater the value, the more updates the function receives. The lower the value, the less updates it receives.

StatesCount

Amount of possible states.

Declaration

public BigInteger StatesCount { get; }

Property Value

Type	Description
System.Numerics.BigInteger

Remarks

The class provides implementation of Q-Learning algorithm, known as off-policy Temporal Difference control.

TriedStatesCount

Gets the number of states that have already been explored by the algorithm.

Declaration

public int TriedStatesCount { get; }

Property Value

Type	Description
System.Int32

Remarks

The class provides implementation of Q-Learning algorithm, known as off-policy Temporal Difference control.

Methods

| Improve this Doc View Source

GetAction(Int32)

Get next action from the specified state.

Declaration

public int GetAction(int state)

Parameters

Type	Name	Description
System.Int32	state	Current state to get an action for.

Returns

Type	Description
System.Int32	Returns the action for the state.

Remarks

The method returns an action according to current exploration policy.

UpdateState(Int32, Int32, Double, Int32)

Update Q-function's value for the previous state-action pair.

Declaration

public void UpdateState(int previousState, int action, double reward, int nextState)

Parameters

Type	Name	Description
System.Int32	previousState	Previous state.
System.Int32	action	Action, which leads from previous to the next state.
System.Double	reward	Reward value, received by taking specified action from previous state.
System.Int32	nextState	Next state.

Remarks

The class provides implementation of Q-Learning algorithm, known as off-policy Temporal Difference control.

Extension Methods

Serializer.Save<T>(T, out Byte[], SerializerCompression)

Serializer.Save<T>(T, Stream, SerializerCompression)

Serializer.Save<T>(T, BinaryFormatter, Stream, SerializerCompression)

Serializer.Save<T>(T, String, SerializerCompression)

Serializer.Save<T>(T, String)

Matrix.Concatenate<T>(T, T[])

Matrix.Replace<T>(T, Object, Object)

DomainDataImporter.Import(Object, InputConfiguration)

ObjectSerialize.Serialize(Object)

Matrix.IsEqual(Object, Object, Decimal, Decimal)

Class InfiniteQLearning

Inheritance

Inherited Members

Namespace: Mars.Components.Services.Learning

Assembly: Mars.Components.dll

Syntax

Remarks

Constructors

InfiniteQLearning(Int32, Int32, IExplorationPolicy)

Declaration

Parameters

Remarks

See Also

Properties

ActionsCount

Declaration

Property Value

Remarks

See Also

DiscountFactor

Declaration

Property Value

Remarks

See Also

ExplorationPolicy

Declaration

Property Value

Remarks

See Also

LearningRate

Declaration

Property Value

Remarks

See Also

StatesCount

Declaration

Property Value

Remarks

See Also

TriedStatesCount

Declaration

Property Value

Remarks

See Also

Methods

GetAction(Int32)

Declaration

Parameters

Returns

Remarks

See Also

UpdateState(Int32, Int32, Double, Int32)

Declaration

Parameters

Remarks

See Also

Extension Methods

See Also