An interesting idea: instead of fixing a number of branch delay slots (or none), what about splitting branches? That is - you have a branch_commit instruction, and anything between the branch and the branch_commit is treated as (a) branch delay slot(s), with the processor either filling in pipeline bubbles or stalling the branch as necessary. If you want to get fancy about it you can start doing matched branch / commit pairs (like nesting brackets), or potentially even start dealing with general matching.
Although the decreased instruction density may kill the benefits.
An interesting idea: instead of fixing a number of branch delay slots (or none), what about splitting branches? That is - you have a branch_commit instruction, and anything between the branch and the branch_commit is treated as (a) branch delay slot(s), with the processor either filling in pipeline bubbles or stalling the branch as necessary. If you want to get fancy about it you can start doing matched branch / commit pairs (like nesting brackets), or potentially even start dealing with general matching.
Although the decreased instruction density may kill the benefits.