Optimization

Hype provides a highly configurable and modular gradient-based optimization functionality. This works similar to many other machine learning libraries.

Here's the novelty:

Thanks to nested AD, gradient-based optimization can be combined with any code, including code which internally takes derivatives of a function to produce its output. In other words, you can optimize the value of a function that is internally optimizing another function, or using derivatives for any other purpose (e.g. running particle simulations, adaptive control), up to any level.

In such a compositional optimization setting, all arising higher-order derivatives are handled for you through nested instantiations of forward and/or reverse AD. In any case, you only need to write your algorithms as usual, only implementing a regular forward algorithm.

Let's explain this through a basic example from the article "Jeffrey Mark Siskind and Barak A. Pearlmutter. Nesting forward-mode AD in a functional framework. Higher Order and Symbolic Computation 21(4):361-76, 2008. doi:10.1007/s10990-008-9037-1", where a parameter of a physics simulation using the gradient of an electric potential is optimized with Newton's method using the Hessian of an error, requiring third-order nesting of derivatives.

Optimizing a physics simulation

Consider a charged particle traveling in a plane with position \(\mathbf{x}(t)\), velocity \(\dot{\mathbf{x}}(t)\), initial position \(\mathbf{x}(0)=(0, 8)\), and initial velocity \(\dot{\mathbf{x}}(0)=(0.75, 0)\). The particle is accelerated by an electric field formed by a pair of repulsive bodies,

\[ p(\mathbf{x}; w) = \| \mathbf{x} - (10, 10 - w)\|^{-1} + \| \mathbf{x} - (10, 0)\|^{-1}\]

where \(w\) is a parameter of this simple particle simulation, adjusting the location of one of the repulsive bodies.

We can simulate the time evolution of this system by using a naive Euler ODE integration

\[ \begin{eqnarray*} \ddot{\mathbf{x}}(t) &=& \left. -\nabla_{\mathbf{x}} p(\mathbf{x}) \right|_{\mathbf{x}=\mathbf{x}(t)}\\ \dot{\mathbf{x}}(t + \Delta t) &=& \dot{\mathbf{x}}(t) + \Delta t \ddot{\mathbf{x}}(t)\\ \mathbf{x}(t + \Delta t) &=& \mathbf{x}(t) + \Delta t \dot{\mathbf{x}}(t) \end{eqnarray*}\]

where \(\Delta t\) is an integration time step.

For a given parameter \(w\), the simulation starts with \(t=0\) and finishes when the particle hits the \(x\)-axis, at position \(\mathbf{x}(t_f)\) at time \(t_f\). When the particle hits the \(x\)-axis, we calculate an error \(E(w) = x_0 (t_f)^2\), the squared horizontal distance of the particle from the origin. We then minimize this error using Newton's method, which finds the optimal value of \(w\) so that the particle eventually hits the \(x\)-axis at the origin.

\[ w^{(i+1)} = w^{(i)} - \frac{E'(w^{(i)})}{E''(w^{(i)})}\]

In other words, the code calculating the trajectory of the particle internally computes the gradient of the electric potential \(p(\mathbf{x}; w)\), and, at the same time, the final position of the trajectory \(\mathbf{x}(t_f)\) is used to compute an error, and the gradient and Hessian of this error are computed during the optimization procedure.

Here's how it goes.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
open Hype
open DiffSharp.AD.Float32

let dt = D 0.1f
let x0 = toDV [0.; 8.]
let v0 = toDV [0.75; 0.]

let p w (x:DV) = (1.f / DV.norm (x - toDV [D 10.f + w * D 0.f; D 10.f - w])) 
               + (1.f / DV.norm (x - toDV [10.; 0.]))

let trajectory (w:D) = 
    (x0, v0) 
    |> Seq.unfold (fun (x, v) ->
                    let a = -grad (p w)  x
                    let v = v + dt * a
                    let x = x + dt * v
                    Some(x, (x, v)))
    |> Seq.takeWhile (fun x -> x.[1] > D 0.f)

let error (w:DV) =
    let xf = trajectory w.[0] |> Seq.last
    xf.[0] * xf.[0]

let w, l, whist, lhist = Optimize.Minimize(error, toDV [0.], 
                                            {Params.Default with 
                                                Method = Newton; 
                                                LearningRate = Constant (D 1.f)
                                                ValidationInterval = 1;
                                                Epochs = 10})
[25/12/2015 23:53:10] --- Minimization started
[25/12/2015 23:53:10] Parameters     : 1
[25/12/2015 23:53:10] Iterations     : 10
[25/12/2015 23:53:10] Valid. interval: 1
[25/12/2015 23:53:10] Method         : Exact Newton
[25/12/2015 23:53:10] Learning rate  : Constant a = D 1.0f
[25/12/2015 23:53:10] Momentum       : None
[25/12/2015 23:53:10] Gradient clip. : None
[25/12/2015 23:53:10] Early stopping : None
[25/12/2015 23:53:10] Improv. thresh.: D 0.995000005f
[25/12/2015 23:53:10] Return best    : true
[25/12/2015 23:53:10]  1/10 | D  2.535113e+000 [- ]
[25/12/2015 23:53:10]  2/10 | D  7.528733e-002 [↓▼]
[25/12/2015 23:53:10]  3/10 | D  1.592970e-002 [↓▼]
[25/12/2015 23:53:10]  4/10 | D  4.178338e-003 [↓▼]
[25/12/2015 23:53:10]  5/10 | D  1.382800e-008 [↓▼]
[25/12/2015 23:53:11]  6/10 | D  3.274181e-011 [↓▼]
[25/12/2015 23:53:11]  7/10 | D  1.151079e-012 [↓▼]
[25/12/2015 23:53:11]  8/10 | D  1.151079e-012 [- ]
[25/12/2015 23:53:11]  9/10 | D  1.151079e-012 [- ]
[25/12/2015 23:53:11] 10/10 | D  3.274181e-011 [↑ ]
[25/12/2015 23:53:11] Duration       : 00:00:00.9201285
[25/12/2015 23:53:11] Value initial  : D  2.535113e+000
[25/12/2015 23:53:11] Value final    : D  1.151079e-012 (Best)
[25/12/2015 23:53:11] Value change   : D -2.535113e+000 (-100.00 %)
[25/12/2015 23:53:11] Value chg. / s : D -2.755173e+000
[25/12/2015 23:53:11] Iter. / s      : 10.86804723
[25/12/2015 23:53:11] Iter. / min    : 652.0828341
[25/12/2015 23:53:11] --- Minimization finished

val whist : DV [] = [|DV [|0.0f|]; DV [|0.20767726f|]; DV [|0.17457059f|]; DV [|0.190040559f|]; DV [|0.182180524f|]; DV [|0.182166189f|]; DV [|0.182166889f|]; DV [|0.182166755f|]; DV [|0.182166621f|]; DV [|0.182166487f|]|] val w : DV = DV [|0.182166889f|] val lhist : D [] = [|D 2.5351131f; D 2.5351131f; D 0.0752873272f; D 0.0159297027f; D 0.00417833822f; D 1.38279992e-08f; D 3.27418093e-11f; D 1.15107923e-12f; D 1.15107923e-12f; D 1.15107923e-12f|] val l : D = D 1.15107923e-12f

Chart

Optimization parameters

As another example, let's optimize the Beale function

\[ f(\mathbf{x}) = (1.5 - x_1 + x_1 x_2)^2 + (2.25 - x_1 + x_1 x_2^2)^2 + (2.625 - x_1 + x_1 x_2^3)^2\]

starting from \(\mathbf{x} = (1, 1.5)\), using RMSProp. The optimum is at \((3, 0.5)\)

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
let beale (x:DV) = (1.5f - x.[0] + (x.[0] * x.[1])) ** 2.f
                    + (2.25f - x.[0] + x.[0] * x.[1] ** 2.f) ** 2.f
                    + (2.625f - x.[0] + x.[0] * x.[1] ** 3.f) ** 2.f

let wopt, lopt, whist, lhist = Optimize.Minimize(beale, toDV [1.; 1.5], 
                                                    {Params.Default with 
                                                        Epochs = 3000; 
                                                        LearningRate = RMSProp (D 0.01f, D 0.9f)})
[12/11/2015 01:22:59] --- Minimization started
[12/11/2015 01:22:59] Parameters     : 2
[12/11/2015 01:22:59] Iterations     : 3000
[12/11/2015 01:22:59] Valid. interval: 10
[12/11/2015 01:22:59] Method         : Gradient descent
[12/11/2015 01:22:59] Learning rate  : RMSProp a0 = D 0.00999999978f, k = D 0.899999976f
[12/11/2015 01:22:59] Momentum       : None
[12/11/2015 01:22:59] Gradient clip. : None
[12/11/2015 01:22:59] Early stopping : None
[12/11/2015 01:22:59] Improv. thresh.: D 0.995000005f
[12/11/2015 01:22:59] Return best    : true
[12/11/2015 01:22:59]    1/3000 | D  4.125000e+001 [- ]
[12/11/2015 01:22:59]   11/3000 | D  2.655878e+001 [↓▼]
[12/11/2015 01:22:59]   21/3000 | D  2.154373e+001 [↓▼]
[12/11/2015 01:22:59]   31/3000 | D  1.841705e+001 [↓▼]
[12/11/2015 01:22:59]   41/3000 | D  1.624916e+001 [↓▼]
[12/11/2015 01:22:59]   51/3000 | D  1.465973e+001 [↓▼]
[12/11/2015 01:22:59]   61/3000 | D  1.334291e+001 [↓▼]
...
[12/11/2015 01:22:59] 2921/3000 | D  9.084024e-004 [- ]
[12/11/2015 01:22:59] 2931/3000 | D  9.084024e-004 [- ]
[12/11/2015 01:22:59] 2941/3000 | D  9.084024e-004 [- ]
[12/11/2015 01:22:59] 2951/3000 | D  9.084024e-004 [- ]
[12/11/2015 01:22:59] 2961/3000 | D  9.084024e-004 [- ]
[12/11/2015 01:22:59] 2971/3000 | D  9.084024e-004 [- ]
[12/11/2015 01:22:59] 2981/3000 | D  9.084024e-004 [- ]
[12/11/2015 01:22:59] 2991/3000 | D  9.084024e-004 [- ]
[12/11/2015 01:22:59] Duration       : 00:00:00.3142646
[12/11/2015 01:22:59] Value initial  : D  4.125000e+001
[12/11/2015 01:22:59] Value final    : D  8.948371e-004 (Best)
[12/11/2015 01:22:59] Value change   : D -4.124910e+001 (-100.00 %)
[12/11/2015 01:22:59] Value chg. / s : D -1.312560e+002
[12/11/2015 01:22:59] Iter. / s      : 9546.09587
[12/11/2015 01:22:59] Iter. / min    : 572765.7522
[12/11/2015 01:22:59] --- Minimization finished

val wopt : DV = DV [|2.99909306f; 0.50039643f|]

Chart

Chart

Each instantiation of gradient-based optimization is controlled through a collection of parameters, using the Hype.Params type.

If you do not supply any parameters to optimization, the default parameter set Params.Default is used. The default parameters look like this:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
module Params =
     let Default = {Epochs = 100
                    LearningRate = LearningRate.DefaultRMSProp
                    Momentum = NoMomentum
                    Loss = L2Loss
                    Regularization = Regularization.DefaultL2Reg
                    GradientClipping = NoClip
                    Method = GD
                    Batch = Full
                    EarlyStopping = NoEarly
                    ImprovementThreshold = D 0.995f
                    Silent = false
                    ReturnBest = true
                    ValidationInterval = 10
                    LoggingFunction = fun _ _ _ -> ()}

If you want to change only a specific element of the parameter type, you can do so by extending the Params.Default value and overwriting only the parts you need to change, such as this:

1: 
2: 
3: 
4: 
let p = {Params.Default with
            Epochs = 5000
            LearningRate = LearningRate.AdaGrad (D 0.001f)
            Momentum = Nesterov (D 0.9f)}

Optimization method

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
type Method =
    | GD          // Gradient descent
    | CG          // Conjugate gradient
    | CD          // Conjugate descent
    | NonlinearCG // Nonlinear conjugate gradient
    | DaiYuanCG   // Dai & Yuan conjugate gradient
    | NewtonCG    // Newton conjugate gradient
    | Newton      // Exact Newton

Learning rate

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
type LearningRate =
    | Constant    of D         // Constant
    | Decay       of D * D     // 1 / t decay, a = a0 / (1 + kt). Initial value, decay rate
    | ExpDecay    of D * D     // Exponential decay, a = a0 * Exp(-kt). Initial value, decay rate
    | Schedule    of DV        // Scheduled learning rate vector, its length overrides Params.Epochs
    | Backtrack   of D * D * D // Backtracking line search. Initial value, c, rho
    | StrongWolfe of D * D * D // Strong Wolfe line search. lmax, c1, c2
    | AdaGrad     of D         // Adagrad. Initial value
    | RMSProp     of D * D     // RMSProp. Initial value, decay rate
    static member DefaultConstant    = Constant (D 0.001f)
    static member DefaultDecay       = Decay (D 1.f, D 0.1f)
    static member DefaultExpDecay    = ExpDecay (D 1.f, D 0.1f)
    static member DefaultBacktrack   = Backtrack (D 1.f, D 0.0001f, D 0.5f)
    static member DefaultStrongWolfe = StrongWolfe (D 1.f, D 0.0001f, D 0.5f)
    static member DefaultAdaGrad     = AdaGrad (D 0.001f)
    static member DefaultRMSProp     = RMSProp (D 0.001f, D 0.9f)

Momentum

1: 
2: 
3: 
4: 
5: 
6: 
type Momentum =
    | Momentum of D // Default momentum
    | Nesterov of D // Nesterov momentum
    | NoMomentum
    static member DefaultMomentum = Momentum (D 0.9f)
    static member DefaultNesterov = Nesterov (D 0.9f)

Gradient clipping

1: 
2: 
3: 
4: 
type GradientClipping =
    | NormClip of D // Norm clipping
    | NoClip
    static member DefaultNormClip = NormClip (D 1.f)

Finally, looking at the API reference and the source code of the optimization module can give you a better idea of the optimization algorithms currently implemented.

namespace Hype
namespace DiffSharp
namespace DiffSharp.AD
module Float32

from DiffSharp.AD
val dt : D

Full name: Optimization.dt
union case D.D: float32 -> D
val x0 : DV

Full name: Optimization.x0
val toDV : v:seq<'a> -> DV (requires member op_Explicit)

Full name: DiffSharp.AD.Float32.DOps.toDV
val v0 : DV

Full name: Optimization.v0
val p : w:D -> x:DV -> D

Full name: Optimization.p
val w : D
val x : DV
Multiple items
union case DV.DV: float32 [] -> DV

--------------------
module DV

from DiffSharp.AD.Float32

--------------------
type DV =
  | DV of float32 []
  | DVF of DV * DV * uint32
  | DVR of DV * DV ref * TraceOp * uint32 ref * uint32
  member Copy : unit -> DV
  member GetForward : t:DV * i:uint32 -> DV
  member GetReverse : i:uint32 -> DV
  member GetSlice : lower:int option * upper:int option -> DV
  member ToArray : unit -> D []
  member ToColDM : unit -> DM
  member ToMathematicaString : unit -> string
  member ToMatlabString : unit -> string
  member ToRowDM : unit -> DM
  override ToString : unit -> string
  member Visualize : unit -> string
  member A : DV
  member F : uint32
  member Item : i:int -> D with get
  member Length : int
  member P : DV
  member PD : DV
  member T : DV
  member A : DV with set
  member F : uint32 with set
  static member Abs : a:DV -> DV
  static member Acos : a:DV -> DV
  static member AddItem : a:DV * i:int * b:D -> DV
  static member AddSubVector : a:DV * i:int * b:DV -> DV
  static member Append : a:DV * b:DV -> DV
  static member Asin : a:DV -> DV
  static member Atan : a:DV -> DV
  static member Atan2 : a:int * b:DV -> DV
  static member Atan2 : a:DV * b:int -> DV
  static member Atan2 : a:float32 * b:DV -> DV
  static member Atan2 : a:DV * b:float32 -> DV
  static member Atan2 : a:D * b:DV -> DV
  static member Atan2 : a:DV * b:D -> DV
  static member Atan2 : a:DV * b:DV -> DV
  static member Ceiling : a:DV -> DV
  static member Cos : a:DV -> DV
  static member Cosh : a:DV -> DV
  static member Exp : a:DV -> DV
  static member Floor : a:DV -> DV
  static member L1Norm : a:DV -> D
  static member L2Norm : a:DV -> D
  static member L2NormSq : a:DV -> D
  static member Log : a:DV -> DV
  static member Log10 : a:DV -> DV
  static member LogSumExp : a:DV -> D
  static member Max : a:DV -> D
  static member Max : a:D * b:DV -> DV
  static member Max : a:DV * b:D -> DV
  static member Max : a:DV * b:DV -> DV
  static member MaxIndex : a:DV -> int
  static member Mean : a:DV -> D
  static member Min : a:DV -> D
  static member Min : a:D * b:DV -> DV
  static member Min : a:DV * b:D -> DV
  static member Min : a:DV * b:DV -> DV
  static member MinIndex : a:DV -> int
  static member Normalize : a:DV -> DV
  static member OfArray : a:D [] -> DV
  static member Op_DV_D : a:DV * ff:(float32 [] -> float32) * fd:(DV -> D) * df:(D * DV * DV -> D) * r:(DV -> TraceOp) -> D
  static member Op_DV_DM : a:DV * ff:(float32 [] -> float32 [,]) * fd:(DV -> DM) * df:(DM * DV * DV -> DM) * r:(DV -> TraceOp) -> DM
  static member Op_DV_DV : a:DV * ff:(float32 [] -> float32 []) * fd:(DV -> DV) * df:(DV * DV * DV -> DV) * r:(DV -> TraceOp) -> DV
  static member Op_DV_DV_D : a:DV * b:DV * ff:(float32 [] * float32 [] -> float32) * fd:(DV * DV -> D) * df_da:(D * DV * DV -> D) * df_db:(D * DV * DV -> D) * df_dab:(D * DV * DV * DV * DV -> D) * r_d_d:(DV * DV -> TraceOp) * r_d_c:(DV * DV -> TraceOp) * r_c_d:(DV * DV -> TraceOp) -> D
  static member Op_DV_DV_DM : a:DV * b:DV * ff:(float32 [] * float32 [] -> float32 [,]) * fd:(DV * DV -> DM) * df_da:(DM * DV * DV -> DM) * df_db:(DM * DV * DV -> DM) * df_dab:(DM * DV * DV * DV * DV -> DM) * r_d_d:(DV * DV -> TraceOp) * r_d_c:(DV * DV -> TraceOp) * r_c_d:(DV * DV -> TraceOp) -> DM
  static member Op_DV_DV_DV : a:DV * b:DV * ff:(float32 [] * float32 [] -> float32 []) * fd:(DV * DV -> DV) * df_da:(DV * DV * DV -> DV) * df_db:(DV * DV * DV -> DV) * df_dab:(DV * DV * DV * DV * DV -> DV) * r_d_d:(DV * DV -> TraceOp) * r_d_c:(DV * DV -> TraceOp) * r_c_d:(DV * DV -> TraceOp) -> DV
  static member Op_DV_D_DV : a:DV * b:D * ff:(float32 [] * float32 -> float32 []) * fd:(DV * D -> DV) * df_da:(DV * DV * DV -> DV) * df_db:(DV * D * D -> DV) * df_dab:(DV * DV * DV * D * D -> DV) * r_d_d:(DV * D -> TraceOp) * r_d_c:(DV * D -> TraceOp) * r_c_d:(DV * D -> TraceOp) -> DV
  static member Op_D_DV_DV : a:D * b:DV * ff:(float32 * float32 [] -> float32 []) * fd:(D * DV -> DV) * df_da:(DV * D * D -> DV) * df_db:(DV * DV * DV -> DV) * df_dab:(DV * D * D * DV * DV -> DV) * r_d_d:(D * DV -> TraceOp) * r_d_c:(D * DV -> TraceOp) * r_c_d:(D * DV -> TraceOp) -> DV
  static member Pow : a:int * b:DV -> DV
  static member Pow : a:DV * b:int -> DV
  static member Pow : a:float32 * b:DV -> DV
  static member Pow : a:DV * b:float32 -> DV
  static member Pow : a:D * b:DV -> DV
  static member Pow : a:DV * b:D -> DV
  static member Pow : a:DV * b:DV -> DV
  static member ReLU : a:DV -> DV
  static member ReshapeToDM : m:int * a:DV -> DM
  static member Round : a:DV -> DV
  static member Sigmoid : a:DV -> DV
  static member Sign : a:DV -> DV
  static member Sin : a:DV -> DV
  static member Sinh : a:DV -> DV
  static member SoftMax : a:DV -> DV
  static member SoftPlus : a:DV -> DV
  static member SoftSign : a:DV -> DV
  static member Split : d:DV * n:seq<int> -> seq<DV>
  static member Sqrt : a:DV -> DV
  static member StandardDev : a:DV -> D
  static member Standardize : a:DV -> DV
  static member Sum : a:DV -> D
  static member Tan : a:DV -> DV
  static member Tanh : a:DV -> DV
  static member Variance : a:DV -> D
  static member ZeroN : n:int -> DV
  static member Zero : DV
  static member ( + ) : a:int * b:DV -> DV
  static member ( + ) : a:DV * b:int -> DV
  static member ( + ) : a:float32 * b:DV -> DV
  static member ( + ) : a:DV * b:float32 -> DV
  static member ( + ) : a:D * b:DV -> DV
  static member ( + ) : a:DV * b:D -> DV
  static member ( + ) : a:DV * b:DV -> DV
  static member ( &* ) : a:DV * b:DV -> DM
  static member ( / ) : a:int * b:DV -> DV
  static member ( / ) : a:DV * b:int -> DV
  static member ( / ) : a:float32 * b:DV -> DV
  static member ( / ) : a:DV * b:float32 -> DV
  static member ( / ) : a:D * b:DV -> DV
  static member ( / ) : a:DV * b:D -> DV
  static member ( ./ ) : a:DV * b:DV -> DV
  static member ( .* ) : a:DV * b:DV -> DV
  static member op_Explicit : d:float32 [] -> DV
  static member op_Explicit : d:DV -> float32 []
  static member ( * ) : a:int * b:DV -> DV
  static member ( * ) : a:DV * b:int -> DV
  static member ( * ) : a:float32 * b:DV -> DV
  static member ( * ) : a:DV * b:float32 -> DV
  static member ( * ) : a:D * b:DV -> DV
  static member ( * ) : a:DV * b:D -> DV
  static member ( * ) : a:DV * b:DV -> D
  static member ( - ) : a:int * b:DV -> DV
  static member ( - ) : a:DV * b:int -> DV
  static member ( - ) : a:float32 * b:DV -> DV
  static member ( - ) : a:DV * b:float32 -> DV
  static member ( - ) : a:D * b:DV -> DV
  static member ( - ) : a:DV * b:D -> DV
  static member ( - ) : a:DV * b:DV -> DV
  static member ( ~- ) : a:DV -> DV

Full name: DiffSharp.AD.Float32.DV
val norm : v:DV -> D

Full name: DiffSharp.AD.Float32.DV.norm
val trajectory : w:D -> seq<DV>

Full name: Optimization.trajectory
type D =
  | D of float32
  | DF of D * D * uint32
  | DR of D * D ref * TraceOp * uint32 ref * uint32
  interface IComparable
  member Copy : unit -> D
  override Equals : other:obj -> bool
  member GetForward : t:D * i:uint32 -> D
  override GetHashCode : unit -> int
  member GetReverse : i:uint32 -> D
  override ToString : unit -> string
  member A : D
  member F : uint32
  member P : D
  member PD : D
  member T : D
  member A : D with set
  member F : uint32 with set
  static member Abs : a:D -> D
  static member Acos : a:D -> D
  static member Asin : a:D -> D
  static member Atan : a:D -> D
  static member Atan2 : a:int * b:D -> D
  static member Atan2 : a:D * b:int -> D
  static member Atan2 : a:float32 * b:D -> D
  static member Atan2 : a:D * b:float32 -> D
  static member Atan2 : a:D * b:D -> D
  static member Ceiling : a:D -> D
  static member Cos : a:D -> D
  static member Cosh : a:D -> D
  static member Exp : a:D -> D
  static member Floor : a:D -> D
  static member Log : a:D -> D
  static member Log10 : a:D -> D
  static member LogSumExp : a:D -> D
  static member Max : a:D * b:D -> D
  static member Min : a:D * b:D -> D
  static member Op_D_D : a:D * ff:(float32 -> float32) * fd:(D -> D) * df:(D * D * D -> D) * r:(D -> TraceOp) -> D
  static member Op_D_D_D : a:D * b:D * ff:(float32 * float32 -> float32) * fd:(D * D -> D) * df_da:(D * D * D -> D) * df_db:(D * D * D -> D) * df_dab:(D * D * D * D * D -> D) * r_d_d:(D * D -> TraceOp) * r_d_c:(D * D -> TraceOp) * r_c_d:(D * D -> TraceOp) -> D
  static member Pow : a:int * b:D -> D
  static member Pow : a:D * b:int -> D
  static member Pow : a:float32 * b:D -> D
  static member Pow : a:D * b:float32 -> D
  static member Pow : a:D * b:D -> D
  static member ReLU : a:D -> D
  static member Round : a:D -> D
  static member Sigmoid : a:D -> D
  static member Sign : a:D -> D
  static member Sin : a:D -> D
  static member Sinh : a:D -> D
  static member SoftPlus : a:D -> D
  static member SoftSign : a:D -> D
  static member Sqrt : a:D -> D
  static member Tan : a:D -> D
  static member Tanh : a:D -> D
  static member One : D
  static member Zero : D
  static member ( + ) : a:int * b:D -> D
  static member ( + ) : a:D * b:int -> D
  static member ( + ) : a:float32 * b:D -> D
  static member ( + ) : a:D * b:float32 -> D
  static member ( + ) : a:D * b:D -> D
  static member ( / ) : a:int * b:D -> D
  static member ( / ) : a:D * b:int -> D
  static member ( / ) : a:float32 * b:D -> D
  static member ( / ) : a:D * b:float32 -> D
  static member ( / ) : a:D * b:D -> D
  static member op_Explicit : d:D -> float32
  static member ( * ) : a:int * b:D -> D
  static member ( * ) : a:D * b:int -> D
  static member ( * ) : a:float32 * b:D -> D
  static member ( * ) : a:D * b:float32 -> D
  static member ( * ) : a:D * b:D -> D
  static member ( - ) : a:int * b:D -> D
  static member ( - ) : a:D * b:int -> D
  static member ( - ) : a:float32 * b:D -> D
  static member ( - ) : a:D * b:float32 -> D
  static member ( - ) : a:D * b:D -> D
  static member ( ~- ) : a:D -> D

Full name: DiffSharp.AD.Float32.D
module Seq

from Microsoft.FSharp.Collections
val unfold : generator:('State -> ('T * 'State) option) -> state:'State -> seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.unfold
val v : DV
val a : DV
val grad : f:('c -> D) -> x:'c -> 'c (requires member GetReverse and member get_A)

Full name: DiffSharp.AD.Float32.DiffOps.grad
union case Option.Some: Value: 'T -> Option<'T>
val takeWhile : predicate:('T -> bool) -> source:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.takeWhile
val error : w:DV -> D

Full name: Optimization.error
val w : DV
val xf : DV
val last : source:seq<'T> -> 'T

Full name: Microsoft.FSharp.Collections.Seq.last
val w : DV

Full name: Optimization.w
val l : D

Full name: Optimization.l
val whist : DV []

Full name: Optimization.whist
val lhist : D []

Full name: Optimization.lhist
type Optimize =
  static member Minimize : f:(DV -> D) * w0:DV -> DV * D * DV [] * D []
  static member Minimize : f:(DV -> D) * w0:DV * par:Params -> DV * D * DV [] * D []
  static member Train : f:(DV -> DM -> DM) * w0:DV * d:Dataset -> DV * D * DV [] * D []
  static member Train : f:(DV -> DV -> DV) * w0:DV * d:Dataset -> DV * D * DV [] * D []
  static member Train : f:(DV -> DV -> D) * w0:DV * d:Dataset -> DV * D * DV [] * D []
  static member Train : f:(DV -> DM -> DM) * w0:DV * d:Dataset * v:Dataset -> DV * D * DV [] * D []
  static member Train : f:(DV -> DM -> DM) * w0:DV * d:Dataset * par:Params -> DV * D * DV [] * D []
  static member Train : f:(DV -> DV -> DV) * w0:DV * d:Dataset * v:Dataset -> DV * D * DV [] * D []
  static member Train : f:(DV -> DV -> DV) * w0:DV * d:Dataset * par:Params -> DV * D * DV [] * D []
  static member Train : f:(DV -> DV -> D) * w0:DV * d:Dataset * v:Dataset -> DV * D * DV [] * D []
  ...

Full name: Hype.Optimize
static member Optimize.Minimize : f:(DV -> D) * w0:DV -> DV * D * DV [] * D []
static member Optimize.Minimize : f:(DV -> D) * w0:DV * par:Params -> DV * D * DV [] * D []
Multiple items
module Params

from Hype

--------------------
type Params =
  {Epochs: int;
   Method: Method;
   LearningRate: LearningRate;
   Momentum: Momentum;
   Loss: Loss;
   Regularization: Regularization;
   GradientClipping: GradientClipping;
   Batch: Batch;
   EarlyStopping: EarlyStopping;
   ImprovementThreshold: D;
   ...}

Full name: Hype.Params
val Default : Params

Full name: Hype.Params.Default
type Method =
  | GD
  | CG
  | CD
  | NonlinearCG
  | DaiYuanCG
  | NewtonCG
  | Newton
  override ToString : unit -> string
  member Func : (DV -> (DV -> D) -> DV -> DV -> (DV -> DV) -> D * DV * DV)

Full name: Hype.Method
union case Method.Newton: Method
type LearningRate =
  | Constant of D
  | Decay of D * D
  | ExpDecay of D * D
  | Schedule of DV
  | Backtrack of D * D * D
  | StrongWolfe of D * D * D
  | AdaGrad of D
  | RMSProp of D * D
  override ToString : unit -> string
  member Func : (int -> DV -> (DV -> D) -> D -> DV -> DV ref -> DV -> obj)
  static member DefaultAdaGrad : LearningRate
  static member DefaultBacktrack : LearningRate
  static member DefaultConstant : LearningRate
  static member DefaultDecay : LearningRate
  static member DefaultExpDecay : LearningRate
  static member DefaultRMSProp : LearningRate
  static member DefaultStrongWolfe : LearningRate

Full name: Hype.LearningRate
union case LearningRate.Constant: D -> LearningRate
namespace RProvider
namespace RProvider.graphics
namespace RProvider.grDevices
type R =
  static member Axis : ?x: obj * ?at: obj * ?___: obj * ?side: obj * ?labels: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member abline : ?a: obj * ?b: obj * ?h: obj * ?v: obj * ?reg: obj * ?coef: obj * ?untf: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member arrows : ?x0: obj * ?y0: obj * ?x1: obj * ?y1: obj * ?length: obj * ?angle: obj * ?code: obj * ?col: obj * ?lty: obj * ?lwd: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member assocplot : ?x: obj * ?col: obj * ?space: obj * ?main: obj * ?xlab: obj * ?ylab: obj -> SymbolicExpression + 1 overload
  static member axTicks : ?side: obj * ?axp: obj * ?usr: obj * ?log: obj * ?nintLog: obj -> SymbolicExpression + 1 overload
  static member axis : ?side: obj * ?at: obj * ?labels: obj * ?tick: obj * ?line: obj * ?pos: obj * ?outer: obj * ?font: obj * ?lty: obj * ?lwd: obj * ?lwd_ticks: obj * ?col: obj * ?col_ticks: obj * ?hadj: obj * ?padj: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member axis_Date : ?side: obj * ?x: obj * ?at: obj * ?format: obj * ?labels: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member axis_POSIXct : ?side: obj * ?x: obj * ?at: obj * ?format: obj * ?labels: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member barplot : ?height: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member barplot_default : ?height: obj * ?width: obj * ?space: obj * ?names_arg: obj * ?legend_text: obj * ?beside: obj * ?horiz: obj * ?density: obj * ?angle: obj * ?col: obj * ?border: obj * ?main: obj * ?sub: obj * ?xlab: obj * ?ylab: obj * ?xlim: obj * ?ylim: obj * ?xpd: obj * ?log: obj * ?axes: obj * ?axisnames: obj * ?cex_axis: obj * ?cex_names: obj * ?inside: obj * ?plot: obj * ?axis_lty: obj * ?offset: obj * ?add: obj * ?args_legend: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  ...

Full name: RProvider.graphics.R


R functions for base graphics
R.plot_new(paramsByName: System.Collections.Generic.IDictionary<string,obj>) : RDotNet.SymbolicExpression
R.plot_new(?NULL: obj) : RDotNet.SymbolicExpression


No documentation available
val namedParams : s:seq<string * 'a> -> System.Collections.Generic.IDictionary<string,obj>

Full name: RProvider.Helpers.namedParams
val t : seq<DV>

Full name: Optimization.t
val tx : float []

Full name: Optimization.tx
val ty : float []

Full name: Optimization.ty
val toArray : source:seq<'T> -> 'T []

Full name: Microsoft.FSharp.Collections.Seq.toArray
module Array

from Microsoft.FSharp.Collections
val map : mapping:('T -> 'U) -> array:'T [] -> 'U []

Full name: Microsoft.FSharp.Collections.Array.map
Multiple items
val float32 : value:'T -> float32 (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float32

--------------------
type float32 = System.Single

Full name: Microsoft.FSharp.Core.float32

--------------------
type float32<'Measure> = float32

Full name: Microsoft.FSharp.Core.float32<_>
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = System.Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
val unzip : array:('T1 * 'T2) [] -> 'T1 [] * 'T2 []

Full name: Microsoft.FSharp.Collections.Array.unzip
val box : value:'T -> obj

Full name: Microsoft.FSharp.Core.Operators.box
R.lines(paramsByName: System.Collections.Generic.IDictionary<string,obj>) : RDotNet.SymbolicExpression
R.lines(?x: obj, ?___: obj, ?paramArray: obj []) : RDotNet.SymbolicExpression


Add Connected Line Segments to a Plot
val ignore : value:'T -> unit

Full name: Microsoft.FSharp.Core.Operators.ignore
val beale : x:DV -> D

Full name: Optimization.beale
val wopt : DV

Full name: Optimization.wopt
val lopt : D

Full name: Optimization.lopt
union case LearningRate.RMSProp: D * D -> LearningRate
val ll : float []

Full name: Optimization.ll
R.plot(paramsByName: System.Collections.Generic.IDictionary<string,obj>) : RDotNet.SymbolicExpression
R.plot(?x: obj, ?y: obj, ?___: obj, ?paramArray: obj []) : RDotNet.SymbolicExpression


Generic X-Y Plotting
val contourplot3d : f:(DV -> D) -> xmin:float * xmax:float -> ymin:float * ymax:float -> RDotNet.SymbolicExpression

Full name: Optimization.contourplot3d
val f : (DV -> D)
val xmin : float
val xmax : float
val ymin : float
val ymax : float
val res : int
val xstep : float
val ystep : float
val x : float []
val y : float []
val z : float [,]
module Array2D

from Microsoft.FSharp.Collections
val init : length1:int -> length2:int -> initializer:(int -> int -> 'T) -> 'T [,]

Full name: Microsoft.FSharp.Collections.Array2D.init
property System.Array.Length: int
val i : int
val j : int
val map : mapping:('T -> 'U) -> array:'T [,] -> 'U [,]

Full name: Microsoft.FSharp.Collections.Array2D.map
R.contour(paramsByName: System.Collections.Generic.IDictionary<string,obj>) : RDotNet.SymbolicExpression
R.contour(?x: obj, ?___: obj, ?paramArray: obj []) : RDotNet.SymbolicExpression


Display Contours
val xx : float []

Full name: Optimization.xx
val yy : float []

Full name: Optimization.yy
val last : array:'T [] -> 'T

Full name: Microsoft.FSharp.Collections.Array.last
R.points(paramsByName: System.Collections.Generic.IDictionary<string,obj>) : RDotNet.SymbolicExpression
R.points(?x: obj, ?___: obj, ?paramArray: obj []) : RDotNet.SymbolicExpression


Add Points to a Plot
val Default : Params

Full name: Optimization.Params.Default
property LearningRate.DefaultRMSProp: LearningRate
Multiple items
union case Momentum.Momentum: D -> Momentum

--------------------
type Momentum =
  | Momentum of D
  | Nesterov of D
  | NoMomentum
  override ToString : unit -> string
  member Func : (DV -> DV -> DV)
  static member DefaultMomentum : Momentum
  static member DefaultNesterov : Momentum

Full name: Hype.Momentum
union case Momentum.NoMomentum: Momentum
type Loss =
  | L1Loss
  | L2Loss
  | Quadratic
  | CrossEntropyOnLinear
  | CrossEntropyOnSoftmax
  override ToString : unit -> string
  member Func : (Dataset -> (DM -> DM) -> D)

Full name: Hype.Loss
union case Loss.L2Loss: Loss
type Regularization =
  | L1Reg of D
  | L2Reg of D
  | NoReg
  override ToString : unit -> string
  member Func : (DV -> D)
  static member DefaultL1Reg : Regularization
  static member DefaultL2Reg : Regularization

Full name: Hype.Regularization
property Regularization.DefaultL2Reg: Regularization
type GradientClipping =
  | NormClip of D
  | NoClip
  override ToString : unit -> string
  member Func : (DV -> DV)
  static member DefaultNormClip : GradientClipping

Full name: Hype.GradientClipping
union case GradientClipping.NoClip: GradientClipping
union case Method.GD: Method
type Batch =
  | Full
  | Minibatch of int
  | Stochastic
  override ToString : unit -> string
  member Func : (Dataset -> int -> Dataset)

Full name: Hype.Batch
union case Batch.Full: Batch
type EarlyStopping =
  | Early of int * int
  | NoEarly
  override ToString : unit -> string
  static member DefaultEarly : EarlyStopping

Full name: Hype.EarlyStopping
union case EarlyStopping.NoEarly: EarlyStopping
val p : Params

Full name: Optimization.p
Multiple items
module Params

from Optimization

--------------------
module Params

from Hype

--------------------
type Params =
  {Epochs: int;
   Method: Method;
   LearningRate: LearningRate;
   Momentum: Momentum;
   Loss: Loss;
   Regularization: Regularization;
   GradientClipping: GradientClipping;
   Batch: Batch;
   EarlyStopping: EarlyStopping;
   ImprovementThreshold: D;
   ...}

Full name: Hype.Params
Multiple items
val Default : Params

Full name: Optimization.Params.Default

--------------------
val Default : Params

Full name: Hype.Params.Default
union case LearningRate.AdaGrad: D -> LearningRate
union case Momentum.Nesterov: D -> Momentum
type Method =
  | GD
  | CG
  | CD
  | NonlinearCG
  | DaiYuanCG
  | NewtonCG
  | Newton

Full name: Optimization.Method
union case Method.CG: Method
union case Method.CD: Method
union case Method.NonlinearCG: Method
union case Method.DaiYuanCG: Method
union case Method.NewtonCG: Method
type LearningRate =
  | Constant of D
  | Decay of D * D
  | ExpDecay of D * D
  | Schedule of DV
  | Backtrack of D * D * D
  | StrongWolfe of D * D * D
  | AdaGrad of D
  | RMSProp of D * D
  static member DefaultAdaGrad : LearningRate
  static member DefaultBacktrack : LearningRate
  static member DefaultConstant : LearningRate
  static member DefaultDecay : LearningRate
  static member DefaultExpDecay : LearningRate
  static member DefaultRMSProp : LearningRate
  static member DefaultStrongWolfe : LearningRate

Full name: Optimization.LearningRate
union case LearningRate.Decay: D * D -> LearningRate
union case LearningRate.ExpDecay: D * D -> LearningRate
union case LearningRate.Schedule: DV -> LearningRate
union case LearningRate.Backtrack: D * D * D -> LearningRate
union case LearningRate.StrongWolfe: D * D * D -> LearningRate
static member LearningRate.DefaultConstant : LearningRate

Full name: Optimization.LearningRate.DefaultConstant
static member LearningRate.DefaultDecay : LearningRate

Full name: Optimization.LearningRate.DefaultDecay
static member LearningRate.DefaultExpDecay : LearningRate

Full name: Optimization.LearningRate.DefaultExpDecay
static member LearningRate.DefaultBacktrack : LearningRate

Full name: Optimization.LearningRate.DefaultBacktrack
static member LearningRate.DefaultStrongWolfe : LearningRate

Full name: Optimization.LearningRate.DefaultStrongWolfe
static member LearningRate.DefaultAdaGrad : LearningRate

Full name: Optimization.LearningRate.DefaultAdaGrad
static member LearningRate.DefaultRMSProp : LearningRate

Full name: Optimization.LearningRate.DefaultRMSProp
Multiple items
union case Momentum.Momentum: D -> Momentum

--------------------
type Momentum =
  | Momentum of D
  | Nesterov of D
  | NoMomentum
  static member DefaultMomentum : Momentum
  static member DefaultNesterov : Momentum

Full name: Optimization.Momentum
static member Momentum.DefaultMomentum : Momentum

Full name: Optimization.Momentum.DefaultMomentum
static member Momentum.DefaultNesterov : Momentum

Full name: Optimization.Momentum.DefaultNesterov
type GradientClipping =
  | NormClip of D
  | NoClip
  static member DefaultNormClip : GradientClipping

Full name: Optimization.GradientClipping
union case GradientClipping.NormClip: D -> GradientClipping
static member GradientClipping.DefaultNormClip : GradientClipping

Full name: Optimization.GradientClipping.DefaultNormClip