CummingsICU1997 VerilogCodingEfficiency
CummingsICU1997 VerilogCodingEfficiency
Clifford E. Cummings [email protected] / www.sunburst-design.com Sunburst Design, Inc. 14314 SW Allen Blvd. PMB 501 Beaverton, OR 97005
INTERNATIONAL CADENCE USER GROUP CONFERENCE OCTOBER 5-9, 1997 SAN DIEGO, CALIFORNIA
1. Introduction
What are some of the more optimal ways to code Verilog models and testbenches to shorten simulation times? This paper is a collection of interesting coding style comparisons that have been run on Verilog-XL.
always @(i or case (sel) 3'd0: y = 3'd1: y = 3'd2: y = 3'd3: y = 3'd4: y = 3'd5: y = 3'd6: y = 3'd7: y = endcase endmodule
Figure 1
module IfMux8 (y, i, sel); output y; input [7:0] i; input [2:0] sel; reg wire wire y; [7:0] i; [2:0] sel;
always @(i or sel) if (sel == 3'd0) else if (sel == 3'd1) else if (sel == 3'd2) else if (sel == 3'd3) else if (sel == 3'd4) else if (sel == 3'd5) else if (sel == 3'd6) else if (sel == 3'd7) endmodule
y y y y y y y y
= = = = = = = =
Figure 2 International Cadence Users Group 1997 Rev 1.1 2 Verilog Coding Styles For Improved Simulation Efficiency
Sparc5/32MB RAM 491MB Swap Solaris version 2.5 Verilog-XL 2.2.1 - Undertow 5.3.3 CaseMux8 IfMux8
4. Begin-End Statements
Question: Is there a simulation efficiency difference when extra begin-end pairs are added to Verilog models? The testcase for this benchmark is a synthesizable D flipflop with asynchronous reset. In Figure 3 the synthesizable flip-flop was written with no begin-end statements in the always block (they are not needed for this model). In Figure 4, the same flip-flop was written with three unnecessary begin-end statements. 1000 flipflops were then instantiated into a testbench and simulated.
4.1 Begin-End Efficiency Summary The results in Table 2 show that, using Verilog-XL, the flip-flop with three extra begin-end statements took about 6% more memory to implement and about 6% more time to simulate as the flip-flop with no begin-end statements.
// Includes unneeded begin-end pairs module dff (q, d, clk, rst); output q; input d, clk, rst; reg q; always @(posedge clk or posedge rst) begin if (rst == 1) begin q = 0; end else begin q = d; end end endmodule
// Removed unneeded begin-end pairs module dff (q, d, clk, rst); output q; input d, clk, rst; reg q; always @(posedge clk or posedge rst) if (rst == 1) q = 0; else q = d; endmodule
Figure 3
Figure 4
Sparc5/32MB RAM 491MB Swap Solaris version 2.5 Verilog-XL 2.2.1 Undertow 5.3.3 nobegin begin
Table 2
5.1 Define & Parameter Efficiency Summary The results in Table 3 show that, parameter redefinition and defparams occupy about 25% - 35% more memory than do `define statements.
module paramchk (y, i, en); output [9:0] y; input i, en; parameter Tr_min = 2; parameter Tr_typ = 5; parameter Tr_max = 8; parameter Tf_min = 1; parameter Tf_typ = 4; parameter Tf_max = 7; parameter Tz_min = 3; parameter Tz_typ = 6; parameter Tz_max = 9; bufif1 #(Tr_min:Tr_typ:Tr_max,Tf_min:Tf_typ:Tf_max,Tz_min:Tz_typ:Tz_max) bufif1 #(Tr_min:Tr_typ:Tr_max,Tf_min:Tf_typ:Tf_max,Tz_min:Tz_typ:Tz_max) bufif1 #(Tr_min:Tr_typ:Tr_max,Tf_min:Tf_typ:Tf_max,Tz_min:Tz_typ:Tz_max) bufif1 #(Tr_min:Tr_typ:Tr_max,Tf_min:Tf_typ:Tf_max,Tz_min:Tz_typ:Tz_max) bufif1 #(Tr_min:Tr_typ:Tr_max,Tf_min:Tf_typ:Tf_max,Tz_min:Tz_typ:Tz_max) bufif1 #(Tr_min:Tr_typ:Tr_max,Tf_min:Tf_typ:Tf_max,Tz_min:Tz_typ:Tz_max) bufif1 #(Tr_min:Tr_typ:Tr_max,Tf_min:Tf_typ:Tf_max,Tz_min:Tz_typ:Tz_max) bufif1 #(Tr_min:Tr_typ:Tr_max,Tf_min:Tf_typ:Tf_max,Tz_min:Tz_typ:Tz_max) bufif1 #(Tr_min:Tr_typ:Tr_max,Tf_min:Tf_typ:Tf_max,Tz_min:Tz_typ:Tz_max) bufif1 #(Tr_min:Tr_typ:Tr_max,Tf_min:Tf_typ:Tf_max,Tz_min:Tz_typ:Tz_max) endmodule b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 (y[9],i,en); (y[8],i,en); (y[7],i,en); (y[6],i,en); (y[5],i,en); (y[4],i,en); (y[3],i,en); (y[2],i,en); (y[1],i,en); (y[0],i,en);
Figure 5
module definechk (y, i, en); output [9:0] y; input i, en; bufif1 #(`Tr_min:`Tr_typ:`Tr_max,`Tf_min:`Tf_typ:`Tf_max,`Tz_min:`Tz_typ:`Tz_max) bufif1 #(`Tr_min:`Tr_typ:`Tr_max,`Tf_min:`Tf_typ:`Tf_max,`Tz_min:`Tz_typ:`Tz_max) bufif1 #(`Tr_min:`Tr_typ:`Tr_max,`Tf_min:`Tf_typ:`Tf_max,`Tz_min:`Tz_typ:`Tz_max) bufif1 #(`Tr_min:`Tr_typ:`Tr_max,`Tf_min:`Tf_typ:`Tf_max,`Tz_min:`Tz_typ:`Tz_max) bufif1 #(`Tr_min:`Tr_typ:`Tr_max,`Tf_min:`Tf_typ:`Tf_max,`Tz_min:`Tz_typ:`Tz_max) bufif1 #(`Tr_min:`Tr_typ:`Tr_max,`Tf_min:`Tf_typ:`Tf_max,`Tz_min:`Tz_typ:`Tz_max) bufif1 #(`Tr_min:`Tr_typ:`Tr_max,`Tf_min:`Tf_typ:`Tf_max,`Tz_min:`Tz_typ:`Tz_max) bufif1 #(`Tr_min:`Tr_typ:`Tr_max,`Tf_min:`Tf_typ:`Tf_max,`Tz_min:`Tz_typ:`Tz_max) bufif1 #(`Tr_min:`Tr_typ:`Tr_max,`Tf_min:`Tf_typ:`Tf_max,`Tz_min:`Tz_typ:`Tz_max) bufif1 #(`Tr_min:`Tr_typ:`Tr_max,`Tf_min:`Tf_typ:`Tf_max,`Tz_min:`Tz_typ:`Tz_max) endmodule b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 (y[9], (y[8], (y[7], (y[6], (y[5], (y[4], (y[3], (y[2], (y[1], (y[0], i, i, i, i, i, i, i, i, i, i, en); en); en); en); en); en); en); en); en); en);
Figure 6 International Cadence Users Group 1997 Rev 1.1 4 Verilog Coding Styles For Improved Simulation Efficiency
2 5 8 1 4 7 3 6 9
`define CNT 1000000 `define cycle 20 `timescale 1ns / 1ns module tb_define; wire [9:0] y9, y8, y7, y6, y5, y4, y3, y2, y1, y0; reg i, en; reg clk; definechk definechk definechk definechk definechk definechk definechk definechk definechk definechk i9 i8 i7 i6 i5 i4 i3 i2 i1 i0 (y9, (y8, (y7, (y6, (y5, (y4, (y3, (y2, (y1, (y0, i, i, i, i, i, i, i, i, i, i, en); en); en); en); en); en); en); en); en); en);
initial begin clk = 0; forever #(`cycle/2) clk = ~clk; end initial begin i = 0; en = 1; repeat (`CNT) begin @(negedge clk) i = ~i; end @(negedge clk) i = ~i; repeat (`CNT) begin @(negedge clk) en = ~en; end @(negedge clk) i = ~i; repeat (`CNT) begin @(negedge clk) en = ~en; end `ifdef RUN @(negedge clk) $finish(2); `else @(negedge clk) $stop(2); `endif end endmodule
Figure 7
`define CNT 1000000 `define cycle 20 `timescale 1ns / 1ns module tb_defparam; wire [9:0] y9, y8, y7, y6, y5, y4, y3, y2, y1, y0; reg i, en, clk; paramchk defparam defparam defparam paramchk defparam defparam defparam paramchk defparam defparam defparam paramchk defparam defparam defparam paramchk defparam defparam defparam paramchk defparam defparam defparam paramchk defparam defparam defparam paramchk defparam defparam defparam paramchk defparam defparam defparam paramchk defparam defparam defparam i9 (y9, i, en); i9.Tr_min=0; defparam i9.Tf_min=3; defparam i9.Tz_min=6; defparam i8 (y8, i, en); i8.Tr_min=0; defparam i8.Tf_min=3; defparam i8.Tz_min=6; defparam i7 (y7, i, en); i7.Tr_min=0; defparam i7.Tf_min=3; defparam i7.Tz_min=6; defparam i6 (y6, i, en); i6.Tr_min=0; defparam i6.Tf_min=3; defparam i6.Tz_min=6; defparam i5 (y5, i, en); i5.Tr_min=0; defparam i5.Tf_min=3; defparam i5.Tz_min=6; defparam i4 (y4, i, en); i4.Tr_min=0; defparam i4.Tf_min=3; defparam i4.Tz_min=6; defparam i3 (y3, i, en); i3.Tr_min=0; defparam i3.Tf_min=3; defparam i3.Tz_min=6; defparam i2 (y2, i, en); i2.Tr_min=0; defparam i2.Tf_min=3; defparam i2.Tz_min=6; defparam i1 (y1, i, en); i1.Tr_min=0; defparam i1.Tf_min=3; defparam i1.Tz_min=6; defparam i0 (y0, i, en); i0.Tr_min=0; defparam i0.Tf_min=3; defparam i0.Tz_min=6; defparam
i9.Tr_typ=1; i9.Tf_typ=4; i9.Tz_typ=7; i8.Tr_typ=1; i8.Tf_typ=4; i8.Tz_typ=7; i7.Tr_typ=1; i7.Tf_typ=4; i7.Tz_typ=7; i6.Tr_typ=1; i6.Tf_typ=4; i6.Tz_typ=7; i5.Tr_typ=1; i5.Tf_typ=4; i5.Tz_typ=7; i4.Tr_typ=1; i4.Tf_typ=4; i4.Tz_typ=7; i3.Tr_typ=1; i3.Tf_typ=4; i3.Tz_typ=7; i2.Tr_typ=1; i2.Tf_typ=4; i2.Tz_typ=7; i1.Tr_typ=1; i1.Tf_typ=4; i1.Tz_typ=7; i0.Tr_typ=1; i0.Tf_typ=4; i0.Tz_typ=7;
defparam i9.Tr_max=2; defparam i9.Tf_max=5; defparam i9.Tz_max=8; defparam i8.Tr_max=2; defparam i8.Tf_max=5; defparam i8.Tz_max=8; defparam i7.Tr_max=2; defparam i7.Tf_max=5; defparam i7.Tz_max=8; defparam i6.Tr_max=2; defparam i6.Tf_max=5; defparam i6.Tz_max=8; defparam i5.Tr_max=2; defparam i5.Tf_max=5; defparam i5.Tz_max=8; defparam i4.Tr_max=2; defparam i4.Tf_max=5; defparam i4.Tz_max=8; defparam i3.Tr_max=2; defparam i3.Tf_max=5; defparam i3.Tz_max=8; defparam i2.Tr_max=2; defparam i2.Tf_max=5; defparam i2.Tz_max=8; defparam i1.Tr_max=2; defparam i1.Tf_max=5; defparam i1.Tz_max=8; defparam i0.Tr_max=2; defparam i0.Tf_max=5; defparam i0.Tz_max=8;
initial begin clk = 0; forever #(`cycle/2) clk = ~clk; end initial begin i = 0; en = 1; repeat (`CNT) @(negedge clk) i = ~i; @(negedge clk) i = ~i; repeat (`CNT) @(negedge clk) en = ~en; @(negedge clk) i = ~i; repeat (`CNT) @(negedge clk) en = ~en; `ifdef RUN @(negedge clk) $finish(2); `else @(negedge clk) $stop(2); `endif end endmodule
Figure 8
`define CNT 1000000 `define cycle 20 `timescale 1ns / 1ns module tb_param; wire [9:0] y9, y8, y7, y6, y5, y4, y3, y2, y1, y0; reg i, en; reg clk; paramchk paramchk paramchk paramchk paramchk paramchk paramchk paramchk paramchk paramchk #(0,1,2,3,4,5,6,7,8) #(0,1,2,3,4,5,6,7,8) #(0,1,2,3,4,5,6,7,8) #(0,1,2,3,4,5,6,7,8) #(0,1,2,3,4,5,6,7,8) #(0,1,2,3,4,5,6,7,8) #(0,1,2,3,4,5,6,7,8) #(0,1,2,3,4,5,6,7,8) #(0,1,2,3,4,5,6,7,8) #(0,1,2,3,4,5,6,7,8) i9 i8 i7 i6 i5 i4 i3 i2 i1 i0 (y9, (y8, (y7, (y6, (y5, (y4, (y3, (y2, (y1, (y0, i, i, i, i, i, i, i, i, i, i, en); en); en); en); en); en); en); en); en); en);
initial begin clk = 0; forever #(`cycle/2) clk = ~clk; end initial begin i = 0; en = 1; repeat (`CNT) begin @(negedge clk) i = ~i; end @(negedge clk) i = ~i; repeat (`CNT) begin @(negedge clk) en = ~en; end @(negedge clk) i = ~i; repeat (`CNT) begin @(negedge clk) en = ~en; end `ifdef RUN @(negedge clk) $finish(2); `else @(negedge clk) $stop(2); `endif end endmodule
Figure 9
Sparc5/32MB RAM 491MB Swap Solaris version 2.5 Verilog-XL 2.2.1 - Undertow 5.3.3 define paramredef defparam
Table 3
`define ICNT 10000000 `define cycle 100 `timescale 1ns / 100ps module AlwaysGroup; reg clk; reg [7:0] a, b, c, d, e; initial begin clk = 0; forever #(`cycle) clk = ~clk; end initial begin a = 8'haa; forever @(negedge clk) a = ~ a; end initial begin repeat(`ICNT) @(posedge clk); `ifdef RUN @(posedge clk) $finish(2); `else @(posedge clk) $stop(2); `endif end `ifdef GROUP4 // Group of four always blocks always @(posedge clk) begin b <= a; end always @(posedge clk) begin c <= b; end always @(posedge clk) begin d <= c; end always @(posedge clk) begin e <= d; end `else // // always b <= c <= d <= e <= end `endif endmodule Four assignments grouped into a single always block @(posedge clk) begin a; b; c; d;
Figure 10
Sparc5/32MB RAM 491MB Swap Solaris version 2.5 Verilog-XL 2.2.1 - Undertow 5.3.3 group1 group4
Table 4
7. Port Connections
Question: Is there a penalty for passing data over module ports as opposed to passing data by hierarchical reference? The idea for this benchmark came from a paper presented by Martin Gravenstein[1] of Ford Microelectronics at the 1994 International Verilog Conference. The testcase for this benchmark is a pair of flip-flops. The first flip-flop has no ports and testbench communication with this model was conducted by hierarchical reference. The second flip-flop is a model with normal port communication with a testbench (Figure 11).
`ifdef NOPORTS module PortModels; reg [15:0] q; reg [15:0] d; reg clk, rstN; always @(posedge clk or negedge rstN) if (rstN == 0) q <= 0; else q <= d; endmodule `else module PortModels (q, d, clk, rstN); output [15:0] q; input [15:0] d; input clk, rstN; reg [15:0] q; always @(posedge clk or negedge rstN) if (rstN == 0) q <= 0; else q <= d; endmodule `endif
7.1 Ports/No Ports Efficiency Summary The results in Table 5 show that, communicating with a four-port model as opposed to referencing the ports hierarchically required about 46% more simulation time Model port usage and communication is still recommended; however, passing monitor data over ports would be simulation-time expensive. It is better to reference monitored state and bus data hierarchically These results also suggest that models with extra levels of hierarchy will significantly slow down a simulation
Figure 11
Sparc5/32MB RAM 491MB Swap Solaris version 2.5 Verilog-XL 2.2.1 - Undertow 5.3.3 NoPorts Ports
Table 5
8. `timescale Efficiencies
Question: Are there any simulator performance penalties for using a higher precision `timescale during simulation? The testcase for this benchmark is a buffer and inverter with propagation delays that are simulated with a variety of `timescales (Figure 12). 8.1 `timescale Efficiency Summary The results in Table 6 show that, using a `timescale of 1ns/1ps requires about 156% more memory and about 99% more time to simulate than the same model using a timescale of 1ns/1ns.
`timescale 1ns / 1ns `timescale 1ns / 100ps `timescale 1ns / 10ps `timescale 1ns / 1ps
module TimeModel; reg i; wire [1:2] y; initial begin i = 0; forever #(`cycle) i = ~i; end initial begin repeat(`ICNT) @(posedge i); `ifdef RUN @(posedge i) $finish(2); `else @(posedge i) $stop(2); `endif end buf #(2.201, 3.667) i1 (y[1], i); not #(4.633, 7.499) i2 (y[2], i); endmodule
Figure 12 The results in Table 7 show that, needless display of $time values is very costly in simulation time Memory Usage Percentages 100.00% 100.00% 110.37% 255.58% Simulation CPU Time (in seconds) 1395.0 1459.4 1718.1 2777.8 CPU Simulation Percentages 100.00% 104.62% 123.16% 199.13%
Simulation CPU Time (in seconds) 906.6 1297.3 1646.1 1624.5 1348.4 1328.1
`define hcycle 12.5 `define ICNT 1000000 `timescale 1ns / 100ps module DisplayTime; reg clk; time rClkTime, fClkTime; real rRealTime, fRealTime; initial begin clk = 0; forever #(`hcycle) clk = ~clk; end initial begin repeat(`ICNT) @(posedge clk); `ifdef RUN @(posedge clk) $finish(2); `else @(posedge clk) $stop(2); `endif end `ifdef NoDisplay // display with no time values always @(negedge clk) begin fClkTime = $time; $display ("Negedge Clk"); end `endif `ifdef OneDisplay // display with one time value always @(negedge clk) begin fClkTime = $time; $display ("Negedge Clk at %d", fClkTime); end `endif `ifdef TwoDisplay // display with two time values always @(negedge clk) begin fClkTime = $time; $display ("Posedge Clk at %d - Negedge Clk at %d", rClkTime, fClkTime); end `endif `ifdef TwoDisplay0 // display with two time values always @(negedge clk) begin fClkTime = $time; $display ("Posedge Clk at %0d - Negedge Clk at %0d", rClkTime, fClkTime); end `endif `ifdef TwoFormat0 // display with two time values always @(negedge clk) begin fClkTime = $time; $display ("Posedge Clk at %0t - Negedge Clk at %0t", rClkTime, fClkTime); end `endif `ifdef TwoRtimeFormat0 // display with two time values initial $timeformat(-9,2,"ns",15); always @(negedge clk) begin fRealTime = $realtime; $display ("Posedge Clk at %0t - Negedge Clk at %0t", rRealTime,fRealTime); end `endif `ifdef TwoRtimeFormat0 `else `endif endmodule always @(posedge clk) rRealTime = $realtime; always @(posedge clk) rClkTime = $time;
Figure 13 International Cadence Users Group 1997 Rev 1.1 11 Verilog Coding Styles For Improved Simulation Efficiency
`define ICNT 100_000 `define cycle 100 `timescale 1ns / 1ns module Clocks; reg d, rstN; `ifdef ALWAYS reg clk; // driven by a procedural block initial clk = 0; always // free running behave clk #1 #(`cycle/2) clk = ~clk; `endif `ifdef FOREVER reg clk; // driven by a procedural block initial begin clk = 0; forever #(`cycle/2) clk = ~clk; end `endif `ifdef GATE // free running clk #3 (gate) reg start; wire clk; // driven by a gate initial begin start = 0; #(`cycle/2) start = 1; end nand #(`cycle/2) (clk, clk, start); `endif dff d1 (q, d, clk, rstN); initial begin rstN = 0; d = 1; @(negedge clk) rstN = 1; repeat(`ICNT) @(posedge clk); `ifdef RUN @(posedge clk) $finish(2); `else @(posedge clk) $stop(2); `endif end // Veritools Undertow-dumpfile option `ifdef UT initial begin $dumpfile(dump.ut); $vtDumpvars; end `endif endmodule
12. References
[1] M. Gravenstein, "Modeling Techniques to Support System Level Simulation and a Top-Down Development Methodology," International Verilog HDL Conference Proceedings 1994, pp 43-50 [2] J. Lawrence & C. Ussery, "INCA: A Next-Generation Architecture for Simulation," International Cadence Users Group Conference Proceedings 1995, pp 105109
Figure 14
Sparc5/32MB RAM 491MB Swap Solaris version 2.5 Verilog-XL 2.2.1 - Undertow 5.3.3 Always Forever Gate AlwaysUT ForeverUT GateUT
Data Structure (bytes of memory) 88104 88168 88588 90368 90432 90696
Simulation CPU Time (in seconds) 1359.2 1371.0 2562.7 2446.5 2481.5 3448.8
Table 8 International Cadence Users Group 1997 Rev 1.1 12 Verilog Coding Styles For Improved Simulation Efficiency
13