198 - Diamond: How do users improve the performance of FPGA designs that use Embedded Block RAM (EBR)?
Many times FPGA designs that contain Embedded Block RAM (EBR) do not meet system performance requirements. Here are some suggestions for improving the performance of designs using EBR.
- There may be a large amount of logic surrounding the EBR. Inspect the HDL source code and try to reduce the amount of logic surrounding the EBR. It may be possible to get identical functionality using a different HDL representation of the logic.
- It may be that the synthesis tool is not optimized to infer the logic
for the EBR. Take some time to review both the logical implementation created by the synthesis tool as well as the device specific implementation created by the synthesis tool. It may be necessary to change the HDL code or to apply synthesis constraints in order to achieve optimal synthesis results.
- Routing delays to/from an EBR are sometimes larger than to/from a logic block (PFU/PFF). The larger delay results in lower system performance. It may be necessary to apply constraints on the FPGA mapper, and place and route engine to place the EBR and PFU/PFF resources close together. This is done using PGROUP and UGROUP preferences.
- EBR memories in most Lattice FPGA's have input registers on the address bus, but provide for the option to enable or disable registers on the EBR data outputs. The data propagation time from address valid to data out, plus routing and setup may exceed system requirements. It may be possible to improve overall system performance by enabling the output registers on the EBR and adding a pipeline stage to compensate for the addition of the register. The address to data valid time to the EBR attached output register is much better than the equivalent structure placed in a PFU/PFF.
- It may be that the EBR does not need to actually run at the highest clock rate in the design. If the EBR isn't in the critical path consider adding a MULTICYCLE preference. The MULTICYCLE preference relaxes the clock to clock time required for data to propagate.