Transformer Networks

Use self-attention mechanisms to handle sequential data without the recurrence found in RNNs. They can capture long-range dependencies and parallelize training more efficiently.
Back to top