This makes them robust to long sequences, and we see improvements over the initial iteration of UnitY. Since the model inherently predicts duration, it is much more easily adaptable to the streaming use case, because we know exactly how much speech is needed to be generated for each piece https://elliottpvxza.iyublog.com/23756123/un-imparziale-vista-metastreaming