Class BoltzmannExploration
Boltzmann distribution exploration policy.
Inheritance
System.Object
BoltzmannExploration
Implements
Inherited Members
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.ToString()
Namespace: Mars.Components.Services.Explorations
Assembly: Mars.Components.dll
Syntax
[Serializable]
public class BoltzmannExploration : IExplorationPolicy
Remarks
The class implements exploration policy base on Boltzmann distribution. According to the policy, action a at state s is selected with the next probability:
exp( Q( s, a ) / t )
p( s, a ) = -----------------------------
SUM( exp( Q( s, b ) / t ) )
b
where Q(s, a) is action's a estimation (usefulness) at state s and t is Temperature.
Constructors
| Improve this Doc View SourceBoltzmannExploration(Double)
Initializes a new instance of the BoltzmannExploration class.
Declaration
public BoltzmannExploration(double temperature)
Parameters
Type | Name | Description |
---|---|---|
System.Double | temperature | Temperature parameter of Boltzmann distribution. |
Remarks
The class implements exploration policy base on Boltzmann distribution. According to the policy, action a at state s is selected with the next probability:
exp( Q( s, a ) / t )
p( s, a ) = -----------------------------
SUM( exp( Q( s, b ) / t ) )
b
where Q(s, a) is action's a estimation (usefulness) at state s and t is Temperature.
See Also
Properties
| Improve this Doc View SourceTemperature
Temperature parameter of Boltzmann distribution. Should be greater than 0.
Declaration
public double Temperature { get; set; }
Property Value
Type | Description |
---|---|
System.Double |
Remarks
The property sets the balance between exploration and greedy actions. If temperature is low, then the policy tends to be more greedy.
See Also
Methods
| Improve this Doc View SourceChooseAction(Double[])
Choose an action.
Declaration
public int ChooseAction(double[] actionEstimates)
Parameters
Type | Name | Description |
---|---|---|
System.Double[] | actionEstimates | Action estimates. |
Returns
Type | Description |
---|---|
System.Int32 | Returns selected action. |
Remarks
The method chooses an action depending on the provided estimates. The
estimates can be any sort of estimate, which values usefulness of the action
(expected summary reward, discounted reward, etc).