Skip to main content
Publication

Local to global: Learning dynamics and effect of initialization for transformers