2 results
Early architecture using learned gates for skip connections.
Allows gradients to bypass layers, enabling very deep networks.