Recurrent neural networks

In this example we build a recurrent neural network (RNN) for a language modeling task and train it with a short passage of text for a quick demonstration. Hype currently has three RNN models implemented as Hype.Neural layers, which can be combined freely with other layer types, explained, for example, in the neural networks page. Hype.Neural.Recurrent implements the "vanilla" RNN layer, Hype.Neural.LSTM implements the LSTM layer, and Hype.Neural.GRU implements the gated recurrent unit (GRU) layer.

Language modeling

RNNs are well suited for constructing language models, where we need to predict the probability of a word (or token) given the history of the tokens that came before it. Here, we will use an LSTM-based RNN to construct a word-level language model from a short passage of text, for a basic demonstration of usage. This model can be scaled to larger problems. State-of-the-art models of this type can require considerable computing resources and training time.

The text is from the beginning of Virgil's Aeneid, Book I.

1: 
let text = "I sing of arms and the man, he who, exiled by fate, first came from the coast of Troy to Italy, and to Lavinian shores – hurled about endlessly by land and sea, by the will of the gods, by cruel Juno’s remorseless anger, long suffering also in war, until he founded a city and brought his gods to Latium: from that the Latin people came, the lords of Alba Longa, the walls of noble Rome. Muse, tell me the cause: how was she offended in her divinity, how was she grieved, the Queen of Heaven, to drive a man, noted for virtue, to endure such dangers, to face so many trials? Can there be such anger in the minds of the gods?"

Hype provides a simple Hype.NLP.Language type for tokenizing text. You can look at the API reference and the source code for a better understanding of its usage.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
open Hype
open Hype.Neural
open Hype.NLP
open DiffSharp.AD.Float32
open DiffSharp.Util

let lang = Language(text)

lang.Tokens |> printfn "%A"
lang.Length |> printfn "%A"

These are the tokens extracted from the text, including some of the punctuation marks. When we are sampling from the RNN language model, we will make use of the "." token for signaling the end of a sentence. The puncutation marks are configurable when you are constructing the Language instance. If they are not provided, a default set is used.

[|","; "."; ":"; "?"; "Alba"; "Can"; "Heaven"; "I"; "Italy"; "Juno’s"; "Latin";
  "Latium"; "Lavinian"; "Longa"; "Muse"; "Queen"; "Rome"; "Troy"; "a"; "about";
  "also"; "and"; "anger"; "arms"; "be"; "brought"; "by"; "came"; "cause"; "city";
  "coast"; "cruel"; "dangers"; "divinity"; "drive"; "endlessly"; "endure";
  "exiled"; "face"; "fate"; "first"; "for"; "founded"; "from"; "gods"; "grieved";
  "he"; "her"; "his"; "how"; "hurled"; "in"; "land"; "long"; "lords"; "man";
  "many"; "me"; "minds"; "noble"; "noted"; "of"; "offended"; "people";
  "remorseless"; "sea"; "she"; "shores"; "sing"; "so"; "such"; "suffering";
  "tell"; "that"; "the"; "there"; "to"; "trials"; "until"; "virtue"; "walls";
  "war"; "was"; "who"; "will"; "–"|]

86

There are 86 tokens in this language instance.

Now let's transform the full text to a dataset, using the Language instance holding these tokens. The text will be encoded in a matrix where each column is a representation of each word as a one-hot vector.

1: 
2: 
let text' = lang.EncodeOneHot(text)
text'.Visualize() |> printfn "%s"
DM : 86 x 145

Out of these 145 words, we will construct a dataset where the inputs are the first 144 words and the target outputs are the 144 words starting with a one word shift. This means that, for each word, we want the output (the prediction) to be the following word in our text passage.

1: 
2: 
let data = Dataset(text'.[*, 0..(text'.Cols - 2)],
                   text'.[*, 1..(text'.Cols - 1)])
val data : Dataset = Hype.Dataset
   X: 86 x 144
   Y: 86 x 144

RNNs, and especially the LSTM variety that we will use, can make predictions that take long-term dependencies and contextual information into account. When the language model is trained with a large enough text corpus and the network has enough capacity, state-of-the-art RNN language models are able to learn complex grammatical relations.

For our quick demonstration, we use a linear word embedding layer of 20 units, an LSTM of 100 units and a final linear layer of 86 units (the size of our vocabulary) followed by softmax activation.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
let dim = lang.Length // Vocabulary size, here 86

let n = FeedForward()
n.Add(Linear(dim, 20))
n.Add(LSTM(20, 100))
n.Add(Linear(100, dim))
n.Add(DM.mapCols softmax)

You can also easily stack multiple RNNs on top of each other.

1: 
2: 
3: 
4: 
5: 
6: 
let n = FeedForward()
n.Add(Linear(dim, 20))
n.Add(LSTM(20, 100))
n.Add(LSTM(100, 100))
n.Add(Linear(100, dim))
n.Add(DM.mapCols softmax)

We will observe the the performance of our RNN during training by sampling random sentences from the language model.

Remember that the final output of the network, through the softmax activation, is a vector of word probabilities. When we are sampling, we start with a word, supply this to the network, and use the resulting probabilities at the output to sample from the vocabulary where words with higher probability are more likely to be selected. We then continue by giving the network the last sampled word and repeating this until we hit an "end of sentence" token (we use "." here) or reach a limit of maximum sentence length.

This is how we would sample a sentence starting with a specific word.

1: 
2: 
3: 
4: 
n.Reset()
for i = 0 to 5 do
    lang.Sample(n.Run, "I", [|"."|], 30) // Use "." as the stop token, limit maximum sentence length to 30.
    |> printfn "%s"

Because the model is not trained, we get sequences of random words from the vocabulary.

I be: she dangers Latium endlessly gods remorseless divinity tell and his offended lords trials? about war trials and anger shores so anger Alba a Alba sing her
I? came exiled – suffering shores anger came Latium people sing sing remorseless who brought war walls endlessly anger me founded his.
I – will long of in offended cruel until Queen Italy who anger lords Queen in Longa Muse who people about suffering Italy also grieved cruel hurled who me about
I endlessly city first by face, a Heaven me hurled sea such long noted she noted many sea city anger I noted remorseless cause Queen to remorseless Italy coast
I sea noted noble me minds long sing cause people in walls Italy by Longa first, for grieved sea many walls Troy came was endlessly of in Latium Latium
I and Latin of many suffering Alba Latium war.

We set a training cycle where we run one epoch of training followed by sampling one sentence starting with the word "I". In each epoch, we run through the whole training dataset. With a larger training corpus, we could also run the training with minibatches by stating this in the parameter set (commented out below).

Like the sample sentences above, at the beginning of training, we see mostly random orderings of words. As the training progresses, the cross-entropy loss for our dataset is decreasing and the sentences start exhibiting meaningful word patterns.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
for i = 0 to 1000 do
    let par = {Params.Default with
                //Batch = Minibatch 10
                LearningRate = LearningRate.RMSProp(D 0.01f, D 0.9f)
                Loss = CrossEntropyOnSoftmax
                Epochs = 1
                Silent = true       // Suppress the regular printing of training progress
                ReturnBest = false} 
    let loss, _ = Layer.Train(n, data, par)
    printfn "Epoch: %*i | Loss: %O | Sample: %s" 3 i loss (lang.Sample(n.Run, "I", [|"."|], 30))

Here is a selection of sentences demonstrating the progress of training.

Epoch:   0 | Loss: D  4.478101e+000 | Sample: I Queen drive she Alba endlessly Queen the by how tell his from grieved war her there drive people – lords coast he.
Epoch:  10 | Loss: D  4.102071e+000 | Sample: I people to,, Rome how the he of – sing fate, Muse, by,, Muse the of man Queen Latin and in her cause:
Epoch:  30 | Loss: D  3.438288e+000 | Sample: I walls long to first dangers she her, to founded to virtue sea first Can dangers a founded about Can Queen lords from sea by remorseless founded endlessly Latium
Epoch:  40 | Loss: D  2.007577e+000 | Sample: I Alba gods Alba Rome, the walls Alba Muse Rome anger me the the of the gods to who man me first founded offended endlessly until also grieved long
Epoch:  50 | Loss: D  9.753818e-001 | Sample: I sing people cruel: me the of Rome.
Epoch:  60 | Loss: D  3.944587e-001 | Sample: I sing sing Troy to so hurled endlessly by land sea, by to – hurled about by the of arms, by Juno’s such anger long also in her
Epoch:  70 | Loss: D  2.131431e-001 | Sample: I sing of and the of Longa, by Juno’s anger was in her of Heaven, to a city brought his gods to a gods to Lavinian hurled to
Epoch:  80 | Loss: D  1.895453e-001 | Sample: I sing, by will the of Rome.
Epoch:  90 | Loss: D  1.799535e-001 | Sample: I sing? there Muse the of the of the of arms by the: how she offended in the of? a, he shores hurled by land to
Epoch: 100 | Loss: D  1.733837e-001 | Sample: I sing arms the of Alba gods who, by Juno’s Rome such anger the of the of arms and, by, by from the coast Rome.
Epoch: 110 | Loss: D  1.682917e-001 | Sample: I sing Troy by, by from the of arms and, by, by from came, by Juno’s anger long in the of the of arms cruel Muse
Epoch: 120 | Loss: D  1.639529e-001 | Sample: I sing arms the of Rome.
Epoch: 130 | Loss: D  1.600647e-001 | Sample: I sing arms and, by Juno’s remorseless there and the of the of arms and, by Alba coast Troy to a – his gods by of the of
Epoch: 140 | Loss: D  1.564835e-001 | Sample: I sing arms by the of Rome.
Epoch: 150 | Loss: D  1.531392e-001 | Sample: I sing arms cruel, exiled by coast, he a city in the of the of arms.
Epoch: 160 | Loss: D  1.499920e-001 | Sample: I sing arms cruel man, by the trials arms to shores hurled endlessly by the of gods Italy, me the of Rome.
Epoch: 200 | Loss: D  1.390327e-001 | Sample: I sing arms and, by Juno’s such of the of the of arms Italy, by from the sing arms walls of the of Rome.
Epoch: 230 | Loss: D  1.322940e-001 | Sample: I sing arms the man he, tell from the of arms Italy, by fate, by the of Troy Italy, by fate first from the of the
Epoch: 260 | Loss: D  1.264137e-001 | Sample: I sing brought Muse Muse the of Heaven, by shores remorseless there he in the of arms cruel, by fate, he from the gods to Italy,
Epoch: 420 | Loss: D  1.131158e-001 | Sample: I sing of arms the of Heaven, by Juno’s remorseless hurled such in the of arms.
Epoch: 680 | Loss: D  9.938217e-002 | Sample: I of arms the man he, exiled fate, he virtue, to a? Can be such in the of the of of the of arms.
Epoch: 923 | Loss: D  9.283429e-002 | Sample: I sing of arms and the man he, by fate came from the of to Italy, by the, by Juno’s anger of Rome.
val fsi : Compiler.Interactive.InteractiveSession

Full name: Microsoft.FSharp.Compiler.Interactive.Settings.fsi
property Compiler.Interactive.InteractiveSession.ShowDeclarationValues: bool
val text : string

Full name: RecurrentNets.text
namespace Hype
namespace Hype.Neural
namespace Hype.NLP
namespace DiffSharp
namespace DiffSharp.AD
module Float32

from DiffSharp.AD
module Util

from DiffSharp
val lang : Language

Full name: RecurrentNets.lang
Multiple items
type Language =
  new : text:string -> Language
  new : text:string * punctuation:string [] -> Language
  new : tokens:string [] * punctuation:string [] -> Language
  member DecodeOneHot : x:DM -> string []
  member EncodeOneHot : x:string [] -> DM
  member EncodeOneHot : x:string -> DM
  member Sample : probs:DV -> string
  member Sample : probs:DM -> string []
  member Sample : model:(DM -> DM) * start:string * stop:string [] * maxlen:int -> string
  member Length : int
  ...

Full name: Hype.NLP.Language

--------------------
new : text:string -> Language
new : tokens:string [] * punctuation:string [] -> Language
new : text:string * punctuation:string [] -> Language
property Language.Tokens: string []
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
property Language.Length: int
val text' : DM

Full name: RecurrentNets.text'
member Language.EncodeOneHot : x:string [] -> DM
member Language.EncodeOneHot : x:string -> DM
member DM.Visualize : unit -> string
val data : Dataset

Full name: RecurrentNets.data
Multiple items
type Dataset =
  new : s:seq<DV * DV> -> Dataset
  new : xi:seq<int> * y:DM -> Dataset
  new : x:DM * yi:seq<int> -> Dataset
  new : xi:seq<int> * yi:seq<int> -> Dataset
  new : x:DM * y:DM -> Dataset
  new : xi:seq<int> * onehotdimsx:int * y:DM -> Dataset
  new : x:DM * yi:seq<int> * onehotdimsy:int -> Dataset
  new : xi:seq<int> * onehotdimsx:int * yi:seq<int> * onehotdimsy:int -> Dataset
  private new : x:DM * y:DM * xi:seq<int> * yi:seq<int> -> Dataset
  member AppendBiasRowX : unit -> Dataset
  ...

Full name: Hype.Dataset

--------------------
new : s:seq<DV * DV> -> Dataset
new : x:DM * y:DM -> Dataset
new : xi:seq<int> * yi:seq<int> -> Dataset
new : x:DM * yi:seq<int> -> Dataset
new : xi:seq<int> * y:DM -> Dataset
new : x:DM * yi:seq<int> * onehotdimsy:int -> Dataset
new : xi:seq<int> * onehotdimsx:int * y:DM -> Dataset
new : xi:seq<int> * onehotdimsx:int * yi:seq<int> * onehotdimsy:int -> Dataset
property DM.Cols: int
val dim : int

Full name: RecurrentNets.dim
val n : FeedForward

Full name: RecurrentNets.n
Multiple items
type FeedForward =
  inherit Layer
  new : unit -> FeedForward
  member Add : f:(DM -> DM) -> unit
  member Add : l:Layer -> unit
  override Decode : w:DV -> unit
  override Encode : unit -> DV
  override Init : unit -> unit
  member Insert : i:int * f:(DM -> DM) -> unit
  member Insert : i:int * l:Layer -> unit
  member Remove : i:int -> unit
  ...

Full name: Hype.Neural.FeedForward

--------------------
new : unit -> FeedForward
member FeedForward.Add : f:(DM -> DM) -> unit
member FeedForward.Add : l:Layer -> unit
Multiple items
type Linear =
  inherit Layer
  new : inputs:int * outputs:int -> Linear
  new : inputs:int * outputs:int * initializer:Initializer -> Linear
  override Decode : w:DV -> unit
  override Encode : unit -> DV
  override Init : unit -> unit
  override Reset : unit -> unit
  override Run : x:DM -> DM
  override ToString : unit -> string
  override ToStringFull : unit -> string
  ...

Full name: Hype.Neural.Linear

--------------------
new : inputs:int * outputs:int -> Linear
new : inputs:int * outputs:int * initializer:Initializer -> Linear
Multiple items
type LSTM =
  inherit Layer
  new : inputs:int * memcells:int -> LSTM
  override Decode : w:DV -> unit
  override Encode : unit -> DV
  override Init : unit -> unit
  override Reset : unit -> unit
  override Run : x:DM -> DM
  override ToString : unit -> string
  override ToStringFull : unit -> string
  override Visualize : unit -> string
  ...

Full name: Hype.Neural.LSTM

--------------------
new : inputs:int * memcells:int -> LSTM
Multiple items
union case DM.DM: float32 [,] -> DM

--------------------
module DM

from DiffSharp.AD.Float32

--------------------
type DM =
  | DM of float32 [,]
  | DMF of DM * DM * uint32
  | DMR of DM * DM ref * TraceOp * uint32 ref * uint32
  member Copy : unit -> DM
  member GetCols : unit -> seq<DV>
  member GetForward : t:DM * i:uint32 -> DM
  member GetReverse : i:uint32 -> DM
  member GetRows : unit -> seq<DV>
  member GetSlice : rowStart:int option * rowFinish:int option * col:int -> DV
  member GetSlice : row:int * colStart:int option * colFinish:int option -> DV
  member GetSlice : rowStart:int option * rowFinish:int option * colStart:int option * colFinish:int option -> DM
  member ToMathematicaString : unit -> string
  member ToMatlabString : unit -> string
  override ToString : unit -> string
  member Visualize : unit -> string
  member A : DM
  member Cols : int
  member F : uint32
  member Item : i:int * j:int -> D with get
  member Length : int
  member P : DM
  member PD : DM
  member Rows : int
  member T : DM
  member A : DM with set
  member F : uint32 with set
  static member Abs : a:DM -> DM
  static member Acos : a:DM -> DM
  static member AddDiagonal : a:DM * b:DV -> DM
  static member AddItem : a:DM * i:int * j:int * b:D -> DM
  static member AddSubMatrix : a:DM * i:int * j:int * b:DM -> DM
  static member Asin : a:DM -> DM
  static member Atan : a:DM -> DM
  static member Atan2 : a:int * b:DM -> DM
  static member Atan2 : a:DM * b:int -> DM
  static member Atan2 : a:float32 * b:DM -> DM
  static member Atan2 : a:DM * b:float32 -> DM
  static member Atan2 : a:D * b:DM -> DM
  static member Atan2 : a:DM * b:D -> DM
  static member Atan2 : a:DM * b:DM -> DM
  static member Ceiling : a:DM -> DM
  static member Cos : a:DM -> DM
  static member Cosh : a:DM -> DM
  static member Det : a:DM -> D
  static member Diagonal : a:DM -> DV
  static member Exp : a:DM -> DM
  static member Floor : a:DM -> DM
  static member Inverse : a:DM -> DM
  static member Log : a:DM -> DM
  static member Log10 : a:DM -> DM
  static member Max : a:DM -> D
  static member Max : a:D * b:DM -> DM
  static member Max : a:DM * b:D -> DM
  static member Max : a:DM * b:DM -> DM
  static member MaxIndex : a:DM -> int * int
  static member Mean : a:DM -> D
  static member Min : a:DM -> D
  static member Min : a:D * b:DM -> DM
  static member Min : a:DM * b:D -> DM
  static member Min : a:DM * b:DM -> DM
  static member MinIndex : a:DM -> int * int
  static member Normalize : a:DM -> DM
  static member OfArray : m:int * a:D [] -> DM
  static member OfArray2D : a:D [,] -> DM
  static member OfCols : n:int * a:DV -> DM
  static member OfRows : s:seq<DV> -> DM
  static member OfRows : m:int * a:DV -> DM
  static member Op_DM_D : a:DM * ff:(float32 [,] -> float32) * fd:(DM -> D) * df:(D * DM * DM -> D) * r:(DM -> TraceOp) -> D
  static member Op_DM_DM : a:DM * ff:(float32 [,] -> float32 [,]) * fd:(DM -> DM) * df:(DM * DM * DM -> DM) * r:(DM -> TraceOp) -> DM
  static member Op_DM_DM_DM : a:DM * b:DM * ff:(float32 [,] * float32 [,] -> float32 [,]) * fd:(DM * DM -> DM) * df_da:(DM * DM * DM -> DM) * df_db:(DM * DM * DM -> DM) * df_dab:(DM * DM * DM * DM * DM -> DM) * r_d_d:(DM * DM -> TraceOp) * r_d_c:(DM * DM -> TraceOp) * r_c_d:(DM * DM -> TraceOp) -> DM
  static member Op_DM_DV : a:DM * ff:(float32 [,] -> float32 []) * fd:(DM -> DV) * df:(DV * DM * DM -> DV) * r:(DM -> TraceOp) -> DV
  static member Op_DM_DV_DM : a:DM * b:DV * ff:(float32 [,] * float32 [] -> float32 [,]) * fd:(DM * DV -> DM) * df_da:(DM * DM * DM -> DM) * df_db:(DM * DV * DV -> DM) * df_dab:(DM * DM * DM * DV * DV -> DM) * r_d_d:(DM * DV -> TraceOp) * r_d_c:(DM * DV -> TraceOp) * r_c_d:(DM * DV -> TraceOp) -> DM
  static member Op_DM_DV_DV : a:DM * b:DV * ff:(float32 [,] * float32 [] -> float32 []) * fd:(DM * DV -> DV) * df_da:(DV * DM * DM -> DV) * df_db:(DV * DV * DV -> DV) * df_dab:(DV * DM * DM * DV * DV -> DV) * r_d_d:(DM * DV -> TraceOp) * r_d_c:(DM * DV -> TraceOp) * r_c_d:(DM * DV -> TraceOp) -> DV
  static member Op_DM_D_DM : a:DM * b:D * ff:(float32 [,] * float32 -> float32 [,]) * fd:(DM * D -> DM) * df_da:(DM * DM * DM -> DM) * df_db:(DM * D * D -> DM) * df_dab:(DM * DM * DM * D * D -> DM) * r_d_d:(DM * D -> TraceOp) * r_d_c:(DM * D -> TraceOp) * r_c_d:(DM * D -> TraceOp) -> DM
  static member Op_DV_DM_DM : a:DV * b:DM * ff:(float32 [] * float32 [,] -> float32 [,]) * fd:(DV * DM -> DM) * df_da:(DM * DV * DV -> DM) * df_db:(DM * DM * DM -> DM) * df_dab:(DM * DV * DV * DM * DM -> DM) * r_d_d:(DV * DM -> TraceOp) * r_d_c:(DV * DM -> TraceOp) * r_c_d:(DV * DM -> TraceOp) -> DM
  static member Op_DV_DM_DV : a:DV * b:DM * ff:(float32 [] * float32 [,] -> float32 []) * fd:(DV * DM -> DV) * df_da:(DV * DV * DV -> DV) * df_db:(DV * DM * DM -> DV) * df_dab:(DV * DV * DV * DM * DM -> DV) * r_d_d:(DV * DM -> TraceOp) * r_d_c:(DV * DM -> TraceOp) * r_c_d:(DV * DM -> TraceOp) -> DV
  static member Op_D_DM_DM : a:D * b:DM * ff:(float32 * float32 [,] -> float32 [,]) * fd:(D * DM -> DM) * df_da:(DM * D * D -> DM) * df_db:(DM * DM * DM -> DM) * df_dab:(DM * D * D * DM * DM -> DM) * r_d_d:(D * DM -> TraceOp) * r_d_c:(D * DM -> TraceOp) * r_c_d:(D * DM -> TraceOp) -> DM
  static member Pow : a:int * b:DM -> DM
  static member Pow : a:DM * b:int -> DM
  static member Pow : a:float32 * b:DM -> DM
  static member Pow : a:DM * b:float32 -> DM
  static member Pow : a:D * b:DM -> DM
  static member Pow : a:DM * b:D -> DM
  static member Pow : a:DM * b:DM -> DM
  static member ReLU : a:DM -> DM
  static member ReshapeToDV : a:DM -> DV
  static member Round : a:DM -> DM
  static member Sigmoid : a:DM -> DM
  static member Sign : a:DM -> DM
  static member Sin : a:DM -> DM
  static member Sinh : a:DM -> DM
  static member SoftPlus : a:DM -> DM
  static member SoftSign : a:DM -> DM
  static member Solve : a:DM * b:DV -> DV
  static member SolveSymmetric : a:DM * b:DV -> DV
  static member Sqrt : a:DM -> DM
  static member StandardDev : a:DM -> D
  static member Standardize : a:DM -> DM
  static member Sum : a:DM -> D
  static member Tan : a:DM -> DM
  static member Tanh : a:DM -> DM
  static member Trace : a:DM -> D
  static member Transpose : a:DM -> DM
  static member Variance : a:DM -> D
  static member ZeroMN : m:int -> n:int -> DM
  static member Zero : DM
  static member ( + ) : a:int * b:DM -> DM
  static member ( + ) : a:DM * b:int -> DM
  static member ( + ) : a:float32 * b:DM -> DM
  static member ( + ) : a:DM * b:float32 -> DM
  static member ( + ) : a:DM * b:DV -> DM
  static member ( + ) : a:DV * b:DM -> DM
  static member ( + ) : a:D * b:DM -> DM
  static member ( + ) : a:DM * b:D -> DM
  static member ( + ) : a:DM * b:DM -> DM
  static member ( / ) : a:int * b:DM -> DM
  static member ( / ) : a:DM * b:int -> DM
  static member ( / ) : a:float32 * b:DM -> DM
  static member ( / ) : a:DM * b:float32 -> DM
  static member ( / ) : a:D * b:DM -> DM
  static member ( / ) : a:DM * b:D -> DM
  static member ( ./ ) : a:DM * b:DM -> DM
  static member ( .* ) : a:DM * b:DM -> DM
  static member op_Explicit : d:float32 [,] -> DM
  static member op_Explicit : d:DM -> float32 [,]
  static member ( * ) : a:int * b:DM -> DM
  static member ( * ) : a:DM * b:int -> DM
  static member ( * ) : a:float32 * b:DM -> DM
  static member ( * ) : a:DM * b:float32 -> DM
  static member ( * ) : a:D * b:DM -> DM
  static member ( * ) : a:DM * b:D -> DM
  static member ( * ) : a:DV * b:DM -> DV
  static member ( * ) : a:DM * b:DV -> DV
  static member ( * ) : a:DM * b:DM -> DM
  static member ( - ) : a:int * b:DM -> DM
  static member ( - ) : a:DM * b:int -> DM
  static member ( - ) : a:float32 * b:DM -> DM
  static member ( - ) : a:DM * b:float32 -> DM
  static member ( - ) : a:D * b:DM -> DM
  static member ( - ) : a:DM * b:D -> DM
  static member ( - ) : a:DM * b:DM -> DM
  static member ( ~- ) : a:DM -> DM

Full name: DiffSharp.AD.Float32.DM
val mapCols : f:(DV -> DV) -> m:DM -> DM

Full name: DiffSharp.AD.Float32.DM.mapCols
val softmax : x:'a -> 'a (requires member SoftMax)

Full name: DiffSharp.Util.softmax
override FeedForward.Reset : unit -> unit
val i : int
member Language.Sample : probs:DV -> string
member Language.Sample : probs:DM -> string []
member Language.Sample : model:(DM -> DM) * start:string * stop:string [] * maxlen:int -> string
override FeedForward.Run : x:DM -> DM
val par : Params
Multiple items
module Params

from Hype

--------------------
type Params =
  {Epochs: int;
   Method: Method;
   LearningRate: LearningRate;
   Momentum: Momentum;
   Loss: Loss;
   Regularization: Regularization;
   GradientClipping: GradientClipping;
   Batch: Batch;
   EarlyStopping: EarlyStopping;
   ImprovementThreshold: D;
   ...}

Full name: Hype.Params
val Default : Params

Full name: Hype.Params.Default
type LearningRate =
  | Constant of D
  | Decay of D * D
  | ExpDecay of D * D
  | Schedule of DV
  | Backtrack of D * D * D
  | StrongWolfe of D * D * D
  | AdaGrad of D
  | RMSProp of D * D
  override ToString : unit -> string
  member Func : (int -> DV -> (DV -> D) -> D -> DV -> DV ref -> DV -> obj)
  static member DefaultAdaGrad : LearningRate
  static member DefaultBacktrack : LearningRate
  static member DefaultConstant : LearningRate
  static member DefaultDecay : LearningRate
  static member DefaultExpDecay : LearningRate
  static member DefaultRMSProp : LearningRate
  static member DefaultStrongWolfe : LearningRate

Full name: Hype.LearningRate
union case LearningRate.RMSProp: D * D -> LearningRate
union case D.D: float32 -> D
type Loss =
  | L1Loss
  | L2Loss
  | Quadratic
  | CrossEntropyOnLinear
  | CrossEntropyOnSoftmax
  override ToString : unit -> string
  member Func : (Dataset -> (DM -> DM) -> D)

Full name: Hype.Loss
union case Loss.CrossEntropyOnSoftmax: Loss
val loss : D
Multiple items
type Layer =
  new : unit -> Layer
  abstract member Decode : DV -> unit
  abstract member Encode : unit -> DV
  abstract member Init : unit -> unit
  abstract member Reset : unit -> unit
  abstract member Run : DM -> DM
  abstract member ToStringFull : unit -> string
  abstract member Visualize : unit -> string
  abstract member EncodeLength : int
  member Train : d:Dataset -> D * D []
  ...

Full name: Hype.Neural.Layer

--------------------
new : unit -> Layer
static member Layer.Train : l:Layer * d:Dataset -> D * D []
static member Layer.Train : l:Layer * d:Dataset * v:Dataset -> D * D []
static member Layer.Train : l:Layer * d:Dataset * par:Params -> D * D []
static member Layer.Train : l:Layer * d:Dataset * v:Dataset * par:Params -> D * D []