Cyclone: A broadcast-free dynamic instruction scheduler with selective replay
To achieve high instruction throughput, instruction schedulers must be capable of producing
high-quality schedules that maximize functional unit utilization while at the same time
enabling fast instruction issue logic. Many solutions exist to the scheduling problem, ranging
from compile-time to run-time approaches. Compile-time solutions feature fast and simple
hardware, but at the expense of conservative schedules. Dynamic schedulers produce high-
quality schedules that incorporate run-time information and dependence speculation, but …
high-quality schedules that maximize functional unit utilization while at the same time
enabling fast instruction issue logic. Many solutions exist to the scheduling problem, ranging
from compile-time to run-time approaches. Compile-time solutions feature fast and simple
hardware, but at the expense of conservative schedules. Dynamic schedulers produce high-
quality schedules that incorporate run-time information and dependence speculation, but …
To achieve high instruction throughput, instruction schedulers must be capable of producing high-quality schedules that maximize functional unit utilization while at the same time enabling fast instruction issue logic. Many solutions exist to the scheduling problem, ranging from compile-time to run-time approaches. Compile-time solutions feature fast and simple hardware, but at the expense of conservative schedules. Dynamic schedulers produce high-quality schedules that incorporate run-time information and dependence speculation, but implementing these schedulers requires complex circuits that can slow processor clock speeds.In this paper, we present the Cyclone scheduler, a novel design that captures the benefits of both compileand run-time scheduling. Our approach utilizes a list-based single-pass instruction scheduling algorithm, implemented by hardware at run-time in the front end of the processor pipeline. Once scheduled, instructions are injected into a timed queue that orchestrates their entry into execution. To accommodate branch and load/store dependence speculation, the Cyclone scheduler supports a simple selective replay mechanism. We implement this technique by overloading instruction register forwarding to also detect instructions dependent on incorrectly scheduled operations. Detailed simulation analyses suggest that with sufficient queue width, the Cyclone scheduler can rival the instruction throughput of similarly wide monolithic dynamic schedulers. Furthermore, the circuit complexity of the Cyclone scheduler is much more favorable than a broadcast-based scheduler, as our approach requires no global control signals.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果