Hype: Compositional Machine Learning and Hyperparameter Optimization

Hype is a proof-of-concept deep learning library, where you can perform optimization on compositional machine learning systems of many components, even when such components themselves internally perform optimization.

This is enabled by nested automatic differentiation (AD) giving you access to the automatic exact derivative of any floating-point value in your code with respect to any other. Underlying computations are run by a BLAS/LAPACK backend (OpenBLAS by default).

Automatic derivatives

You do not need to worry about supplying gradients (or Hessians) of your models, which are computed exactly and efficiently by AD. The underlying AD functionality is provided by DiffSharp.

"Reverse mode" AD is a generalized form of "backpropagation" and is distinct from numerical or symbolic differentiation.

In addition to reverse AD, Hype makes use of forward AD and nested combinations of forward and reverse AD. The core differentiation API provides gradients, Hessians, Jacobians, directional derivatives, and matrix-free exact Hessian- and Jacobian-vector products.

Hypergradients

You can get exact gradients of the training or validation loss with respect to hyperparameters. These hypergradients allow you to do gradient-based optimization of gradient-based optimization, meaning that you can do things like optimizing learning rate and momentum schedules, weight initialization parameters, or step sizes and mass matrices in Hamiltonian Monte Carlo models. (A recent article doing this with Python: Maclaurin, Dougal, David Duvenaud, and Ryan P. Adams. "Gradient-based Hyperparameter Optimization through Reversible Learning." arXiv preprint arXiv:1502.03492 (2015).)

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
open Hype
open Hype.Neural

// Train a network with stochastic gradient descent and a learning rate schedule
let train (x:DV) = 
    let n = FeedForward()
    n.Add(Linear(784, 300))
    n.Add(tanh)
    n.Add(Linear(300, 10))
    let loss, _ = Layer.Train(n, data, {Params.Default with 
                                        LearningRate = Schedule x
                                        Momentum = Momentum.DefaultNesterov
                                        Batch = Minibatch 100
                                        Loss = CrossEntropyOnLinear})
    loss // Return the loss at the end of training

// Train the training, i.e., optimize the learning schedule vector by using its hypergradient
let hypertrain = 
    Optimize.Minimize(train, DV.create 200 (D 1.f), {Params.Default with Epochs = 50})

You can also take derivatives with respect to training data, to analyze training sensitivities.

Compositionality

Nested AD handles higher-order derivatives up to any level, including in complex cases such as

\[ \mathbf{min} \left(x \; \mapsto \; f(x) + \mathbf{min} \left( y \; \mapsto \; g(x,\,y) \right) \right)\, ,\]

where \(\mathbf{min}\) uses gradient-based optimization. (Note that the inner function has a reference to the argument of the outer function.) This allows you to create complex systems where many components may internally perform optimization.

For example, you can optimize the rules of a multi-player game where the players themselves optimize their own strategy using a simple model of the opponent which they optimize according to their opponent's observed behaviour.

Or you can perform optimization of procedures that are internally using differentiation for purposes other than optimization, such as adaptive control or simulations.

Complex objective functions

You can use derivatives in the definition of objective functions for training your models. For example, your objective function can take input sensitivities into account, for training models that are invariant under a set of input transformations.

Roadmap

In the current release
  • OpenBLAS backend by default
  • Regression, feedforward neural networks
  • Recurrent neural networks, LSTMs, GRUs
  • Hamiltonian Monte Carlo
Upcoming features
  • GPU/CUDA backend
  • Probabilistic inference
  • Convolutional neural networks

About

Hype is developed by Atılım Güneş Baydin and Barak A. Pearlmutter at the Brain and Computation Lab, Hamilton Institute, National University of Ireland Maynooth.

License

Hype is released under the MIT license.

namespace DiffSharp
namespace DiffSharp.AD
module Float32

from DiffSharp.AD
namespace Hype
namespace Hype.Neural
val train : x:DV -> D

Full name: Index.train
val x : DV
Multiple items
union case DV.DV: float32 [] -> DV

--------------------
module DV

from DiffSharp.AD.Float32

--------------------
type DV =
  | DV of float32 []
  | DVF of DV * DV * uint32
  | DVR of DV * DV ref * TraceOp * uint32 ref * uint32
  member Copy : unit -> DV
  member GetForward : t:DV * i:uint32 -> DV
  member GetReverse : i:uint32 -> DV
  member GetSlice : lower:int option * upper:int option -> DV
  member ToArray : unit -> D []
  member ToColDM : unit -> DM
  member ToMathematicaString : unit -> string
  member ToMatlabString : unit -> string
  member ToRowDM : unit -> DM
  override ToString : unit -> string
  member Visualize : unit -> string
  member A : DV
  member F : uint32
  member Item : i:int -> D with get
  member Length : int
  member P : DV
  member PD : DV
  member T : DV
  member A : DV with set
  member F : uint32 with set
  static member Abs : a:DV -> DV
  static member Acos : a:DV -> DV
  static member AddItem : a:DV * i:int * b:D -> DV
  static member AddSubVector : a:DV * i:int * b:DV -> DV
  static member Append : a:DV * b:DV -> DV
  static member Asin : a:DV -> DV
  static member Atan : a:DV -> DV
  static member Atan2 : a:int * b:DV -> DV
  static member Atan2 : a:DV * b:int -> DV
  static member Atan2 : a:float32 * b:DV -> DV
  static member Atan2 : a:DV * b:float32 -> DV
  static member Atan2 : a:D * b:DV -> DV
  static member Atan2 : a:DV * b:D -> DV
  static member Atan2 : a:DV * b:DV -> DV
  static member Ceiling : a:DV -> DV
  static member Cos : a:DV -> DV
  static member Cosh : a:DV -> DV
  static member Exp : a:DV -> DV
  static member Floor : a:DV -> DV
  static member L1Norm : a:DV -> D
  static member L2Norm : a:DV -> D
  static member L2NormSq : a:DV -> D
  static member Log : a:DV -> DV
  static member Log10 : a:DV -> DV
  static member LogSumExp : a:DV -> D
  static member Max : a:DV -> D
  static member Max : a:D * b:DV -> DV
  static member Max : a:DV * b:D -> DV
  static member Max : a:DV * b:DV -> DV
  static member MaxIndex : a:DV -> int
  static member Mean : a:DV -> D
  static member Min : a:DV -> D
  static member Min : a:D * b:DV -> DV
  static member Min : a:DV * b:D -> DV
  static member Min : a:DV * b:DV -> DV
  static member MinIndex : a:DV -> int
  static member Normalize : a:DV -> DV
  static member OfArray : a:D [] -> DV
  static member Op_DV_D : a:DV * ff:(float32 [] -> float32) * fd:(DV -> D) * df:(D * DV * DV -> D) * r:(DV -> TraceOp) -> D
  static member Op_DV_DM : a:DV * ff:(float32 [] -> float32 [,]) * fd:(DV -> DM) * df:(DM * DV * DV -> DM) * r:(DV -> TraceOp) -> DM
  static member Op_DV_DV : a:DV * ff:(float32 [] -> float32 []) * fd:(DV -> DV) * df:(DV * DV * DV -> DV) * r:(DV -> TraceOp) -> DV
  static member Op_DV_DV_D : a:DV * b:DV * ff:(float32 [] * float32 [] -> float32) * fd:(DV * DV -> D) * df_da:(D * DV * DV -> D) * df_db:(D * DV * DV -> D) * df_dab:(D * DV * DV * DV * DV -> D) * r_d_d:(DV * DV -> TraceOp) * r_d_c:(DV * DV -> TraceOp) * r_c_d:(DV * DV -> TraceOp) -> D
  static member Op_DV_DV_DM : a:DV * b:DV * ff:(float32 [] * float32 [] -> float32 [,]) * fd:(DV * DV -> DM) * df_da:(DM * DV * DV -> DM) * df_db:(DM * DV * DV -> DM) * df_dab:(DM * DV * DV * DV * DV -> DM) * r_d_d:(DV * DV -> TraceOp) * r_d_c:(DV * DV -> TraceOp) * r_c_d:(DV * DV -> TraceOp) -> DM
  static member Op_DV_DV_DV : a:DV * b:DV * ff:(float32 [] * float32 [] -> float32 []) * fd:(DV * DV -> DV) * df_da:(DV * DV * DV -> DV) * df_db:(DV * DV * DV -> DV) * df_dab:(DV * DV * DV * DV * DV -> DV) * r_d_d:(DV * DV -> TraceOp) * r_d_c:(DV * DV -> TraceOp) * r_c_d:(DV * DV -> TraceOp) -> DV
  static member Op_DV_D_DV : a:DV * b:D * ff:(float32 [] * float32 -> float32 []) * fd:(DV * D -> DV) * df_da:(DV * DV * DV -> DV) * df_db:(DV * D * D -> DV) * df_dab:(DV * DV * DV * D * D -> DV) * r_d_d:(DV * D -> TraceOp) * r_d_c:(DV * D -> TraceOp) * r_c_d:(DV * D -> TraceOp) -> DV
  static member Op_D_DV_DV : a:D * b:DV * ff:(float32 * float32 [] -> float32 []) * fd:(D * DV -> DV) * df_da:(DV * D * D -> DV) * df_db:(DV * DV * DV -> DV) * df_dab:(DV * D * D * DV * DV -> DV) * r_d_d:(D * DV -> TraceOp) * r_d_c:(D * DV -> TraceOp) * r_c_d:(D * DV -> TraceOp) -> DV
  static member Pow : a:int * b:DV -> DV
  static member Pow : a:DV * b:int -> DV
  static member Pow : a:float32 * b:DV -> DV
  static member Pow : a:DV * b:float32 -> DV
  static member Pow : a:D * b:DV -> DV
  static member Pow : a:DV * b:D -> DV
  static member Pow : a:DV * b:DV -> DV
  static member ReLU : a:DV -> DV
  static member ReshapeToDM : m:int * a:DV -> DM
  static member Round : a:DV -> DV
  static member Sigmoid : a:DV -> DV
  static member Sign : a:DV -> DV
  static member Sin : a:DV -> DV
  static member Sinh : a:DV -> DV
  static member SoftMax : a:DV -> DV
  static member SoftPlus : a:DV -> DV
  static member SoftSign : a:DV -> DV
  static member Split : d:DV * n:seq<int> -> seq<DV>
  static member Sqrt : a:DV -> DV
  static member StandardDev : a:DV -> D
  static member Standardize : a:DV -> DV
  static member Sum : a:DV -> D
  static member Tan : a:DV -> DV
  static member Tanh : a:DV -> DV
  static member Variance : a:DV -> D
  static member ZeroN : n:int -> DV
  static member Zero : DV
  static member ( + ) : a:int * b:DV -> DV
  static member ( + ) : a:DV * b:int -> DV
  static member ( + ) : a:float32 * b:DV -> DV
  static member ( + ) : a:DV * b:float32 -> DV
  static member ( + ) : a:D * b:DV -> DV
  static member ( + ) : a:DV * b:D -> DV
  static member ( + ) : a:DV * b:DV -> DV
  static member ( &* ) : a:DV * b:DV -> DM
  static member ( / ) : a:int * b:DV -> DV
  static member ( / ) : a:DV * b:int -> DV
  static member ( / ) : a:float32 * b:DV -> DV
  static member ( / ) : a:DV * b:float32 -> DV
  static member ( / ) : a:D * b:DV -> DV
  static member ( / ) : a:DV * b:D -> DV
  static member ( ./ ) : a:DV * b:DV -> DV
  static member ( .* ) : a:DV * b:DV -> DV
  static member op_Explicit : d:float32 [] -> DV
  static member op_Explicit : d:DV -> float32 []
  static member ( * ) : a:int * b:DV -> DV
  static member ( * ) : a:DV * b:int -> DV
  static member ( * ) : a:float32 * b:DV -> DV
  static member ( * ) : a:DV * b:float32 -> DV
  static member ( * ) : a:D * b:DV -> DV
  static member ( * ) : a:DV * b:D -> DV
  static member ( * ) : a:DV * b:DV -> D
  static member ( - ) : a:int * b:DV -> DV
  static member ( - ) : a:DV * b:int -> DV
  static member ( - ) : a:float32 * b:DV -> DV
  static member ( - ) : a:DV * b:float32 -> DV
  static member ( - ) : a:D * b:DV -> DV
  static member ( - ) : a:DV * b:D -> DV
  static member ( - ) : a:DV * b:DV -> DV
  static member ( ~- ) : a:DV -> DV

Full name: DiffSharp.AD.Float32.DV
val n : FeedForward
Multiple items
type FeedForward =
  inherit Layer
  new : unit -> FeedForward
  member Add : f:(DM -> DM) -> unit
  member Add : l:Layer -> unit
  override Decode : w:DV -> unit
  override Encode : unit -> DV
  override Init : unit -> unit
  member Insert : i:int * f:(DM -> DM) -> unit
  member Insert : i:int * l:Layer -> unit
  member Remove : i:int -> unit
  ...

Full name: Hype.Neural.FeedForward

--------------------
new : unit -> FeedForward
member FeedForward.Add : f:(DM -> DM) -> unit
member FeedForward.Add : l:Layer -> unit
Multiple items
type Linear =
  inherit Layer
  new : inputs:int * outputs:int -> Linear
  new : inputs:int * outputs:int * initializer:Initializer -> Linear
  override Decode : w:DV -> unit
  override Encode : unit -> DV
  override Init : unit -> unit
  override Reset : unit -> unit
  override Run : x:DM -> DM
  override ToString : unit -> string
  override ToStringFull : unit -> string
  ...

Full name: Hype.Neural.Linear

--------------------
new : inputs:int * outputs:int -> Linear
new : inputs:int * outputs:int * initializer:Initializer -> Linear
val tanh : value:'T -> 'T (requires member Tanh)

Full name: Microsoft.FSharp.Core.Operators.tanh
val loss : D
Multiple items
type Layer =
  new : unit -> Layer
  abstract member Decode : DV -> unit
  abstract member Encode : unit -> DV
  abstract member Init : unit -> unit
  abstract member Reset : unit -> unit
  abstract member Run : DM -> DM
  abstract member ToStringFull : unit -> string
  abstract member Visualize : unit -> string
  abstract member EncodeLength : int
  member Train : d:Dataset -> D * D []
  ...

Full name: Hype.Neural.Layer

--------------------
new : unit -> Layer
static member Layer.Train : l:Layer * d:Dataset -> D * D []
static member Layer.Train : l:Layer * d:Dataset * v:Dataset -> D * D []
static member Layer.Train : l:Layer * d:Dataset * par:Params -> D * D []
static member Layer.Train : l:Layer * d:Dataset * v:Dataset * par:Params -> D * D []
Multiple items
module Params

from Hype

--------------------
type Params =
  {Epochs: int;
   Method: Method;
   LearningRate: LearningRate;
   Momentum: Momentum;
   Loss: Loss;
   Regularization: Regularization;
   GradientClipping: GradientClipping;
   Batch: Batch;
   EarlyStopping: EarlyStopping;
   ImprovementThreshold: D;
   ...}

Full name: Hype.Params
val Default : Params

Full name: Hype.Params.Default
type LearningRate =
  | Constant of D
  | Decay of D * D
  | ExpDecay of D * D
  | Schedule of DV
  | Backtrack of D * D * D
  | StrongWolfe of D * D * D
  | AdaGrad of D
  | RMSProp of D * D
  override ToString : unit -> string
  member Func : (int -> DV -> (DV -> D) -> D -> DV -> DV ref -> DV -> obj)
  static member DefaultAdaGrad : LearningRate
  static member DefaultBacktrack : LearningRate
  static member DefaultConstant : LearningRate
  static member DefaultDecay : LearningRate
  static member DefaultExpDecay : LearningRate
  static member DefaultRMSProp : LearningRate
  static member DefaultStrongWolfe : LearningRate

Full name: Hype.LearningRate
union case LearningRate.Schedule: DV -> LearningRate
Multiple items
union case Momentum.Momentum: D -> Momentum

--------------------
type Momentum =
  | Momentum of D
  | Nesterov of D
  | NoMomentum
  override ToString : unit -> string
  member Func : (DV -> DV -> DV)
  static member DefaultMomentum : Momentum
  static member DefaultNesterov : Momentum

Full name: Hype.Momentum
property Momentum.DefaultNesterov: Momentum
type Batch =
  | Full
  | Minibatch of int
  | Stochastic
  override ToString : unit -> string
  member Func : (Dataset -> int -> Dataset)

Full name: Hype.Batch
union case Batch.Minibatch: int -> Batch
type Loss =
  | L1Loss
  | L2Loss
  | Quadratic
  | CrossEntropyOnLinear
  | CrossEntropyOnSoftmax
  override ToString : unit -> string
  member Func : (Dataset -> (DM -> DM) -> D)

Full name: Hype.Loss
union case Loss.CrossEntropyOnLinear: Loss
val hypertrain : DV * D * DV [] * D []

Full name: Index.hypertrain
type Optimize =
  static member Minimize : f:(DV -> D) * w0:DV -> DV * D * DV [] * D []
  static member Minimize : f:(DV -> D) * w0:DV * par:Params -> DV * D * DV [] * D []
  static member Train : f:(DV -> DM -> DM) * w0:DV * d:Dataset -> DV * D * DV [] * D []
  static member Train : f:(DV -> DV -> DV) * w0:DV * d:Dataset -> DV * D * DV [] * D []
  static member Train : f:(DV -> DV -> D) * w0:DV * d:Dataset -> DV * D * DV [] * D []
  static member Train : f:(DV -> DM -> DM) * w0:DV * d:Dataset * v:Dataset -> DV * D * DV [] * D []
  static member Train : f:(DV -> DM -> DM) * w0:DV * d:Dataset * par:Params -> DV * D * DV [] * D []
  static member Train : f:(DV -> DV -> DV) * w0:DV * d:Dataset * v:Dataset -> DV * D * DV [] * D []
  static member Train : f:(DV -> DV -> DV) * w0:DV * d:Dataset * par:Params -> DV * D * DV [] * D []
  static member Train : f:(DV -> DV -> D) * w0:DV * d:Dataset * v:Dataset -> DV * D * DV [] * D []
  ...

Full name: Hype.Optimize
static member Optimize.Minimize : f:(DV -> D) * w0:DV -> DV * D * DV [] * D []
static member Optimize.Minimize : f:(DV -> D) * w0:DV * par:Params -> DV * D * DV [] * D []
val create : n:int -> v:'a -> DV

Full name: DiffSharp.AD.Float32.DV.create
union case D.D: float32 -> D