In addition, they show a counter-intuitive scaling limit: their reasoning work improves with dilemma complexity up to some extent, then declines despite obtaining an suitable token spending plan. By comparing LRMs with their typical LLM counterparts below equal inference compute, we detect 3 general performance regimes: (1) small-complexity tasks https://www.youtube.com/watch?v=snr3is5MTiU