BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20260601T092021EDT-3302VsXmnO@132.216.98.100 DTSTAMP:20260601T132021Z DESCRIPTION:Title: Block-State Transformers\n\nAbstract: State space models (SSMs) have shown impressive results on tasks that require modeling long- range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signal s\, SSMs have shown superior performance on a plethora of tasks\, in visio n and audio\; however\, SSMs still lag Transformer performance in Language Modeling tasks. In this work\, we propose a hybrid layer named Block-Stat e Transformer (BST)\, that internally combines an SSM sublayer for long-ra nge contextualization\, and a Block Transformer sublayer for short-term re presentation of sequences. We study three different\, and completely paral lelizable\, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on lan guage modeling perplexity and generalizes to longer sequences. In addition \, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed.\n DTSTART:20231115T180000Z DTEND:20231115T190000Z LOCATION:Room 1214\, Burnside Hall\, CA\, QC\, Montreal\, H3A 0B9\, 805 rue Sherbrooke Ouest SUMMARY:Jonathan Pilault (MILA) CANCELED URL:/mathstat/channels/event/jonathan-pilault-mila-can celed-352698 END:VEVENT END:VCALENDAR