一、SM4的Verilog流水线实现原理
SM4算法是中国国家密码管理局发布的分组密码标准,采用32轮非线性迭代结构。Verilog流水线实现通过将算法分解为多个处理阶段,每个阶段由专用硬件并行执行,显著提高了吞吐量。
流水线设计的关键是将32轮加密操作展开为32个连续的硬件处理单元。每个时钟周期,数据从一个处理单元传递到下一个,形成流水作业。这种设计使得系统可以同时处理多个数据块的不同加密阶段,极大地提高了整体处理速度。
本实现的设计主要包括两个部分:密钥扩展模块和加密模块。密钥扩展模块预先计算32轮子密钥并存储在寄存器中,加密模块则使用这些子密钥并行处理32轮变换,这种分离设计允许密钥扩展和加密操作并行执行。
流水线控制通过状态信号实现,包括busy、din_valid和dout_valid等信号协调各模块工作。当busy信号为低时表示系统就绪可以接收新数据,dout_valid信号则指示输出数据有效,这种清晰的握手协议确保了数据在流水线中的正确流动。
二、Verilog代码解析
1. sm4_top模块
sm4_top是系统的顶层模块,负责实例化和连接密钥扩展与加密模块。它包含时钟、复位、主密钥加载、明文输入和密文输出等接口。顶层设计采用清晰的层次结构,将密钥处理和加密处理分离。
该模块通过busy信号指示系统状态,这是密钥扩展模块和加密模块busy信号的或操作结果。这种设计使得外部控制器可以方便地了解系统忙闲状态,协调数据输入时机。
输入输出采用标准的同步设计,所有信号在时钟上升沿采样。load_mkey信号控制主密钥加载,din_valid信号启动加密过程,dout_valid信号标记输出数据有效,形成完整的数据处理流程。
module sm4_top (
input clk,
input rst_n,
input [127:0] mkey,
input load_mkey,
input [127:0] plaintext,
input din_valid,
output [127:0] ciphertext,
output dout_valid,
output busy
);
wire [1023:0] rk_flatten;
wire encrypt_busy;
wire key_expand_busy;
assign busy = encrypt_busy | key_expand_busy;
encrypt u_en (
.clk(clk),
.rst_n(rst_n),
.din_valid(din_valid),
.plaintext(plaintext),
.ciphertext(ciphertext),
.dout_valid(dout_valid),
.busy(encrypt_busy),
.rk_flatten(rk_flatten)
);
key_expand u_ke (
.clk(clk),
.rst_n(rst_n),
.load_mkey(load_mkey),
.mkey(mkey),
.rk_flatten(rk_flatten),
.busy(key_expand_busy)
);
endmodule
2. key_expand模块
key_expand模块负责从128位主密钥生成32个32位的轮密钥。实现中首先将主密钥与固定密钥FK进行异或,然后通过32轮迭代生成轮密钥。每轮使用不同的固定参数cki。
模块内部采用移位寄存器结构存储中间密钥状态。round_counter计数器控制密钥生成轮数,busy信号在密钥生成期间保持高电平。生成的轮密钥通过rk_flatten总线输出到加密模块。
密钥扩展算法核心是key_expand_round子模块,它实现SM4的密钥扩展非线性变换。该变换包括S盒替换和线性变换操作,与加密轮函数类似但参数不同,确保了密钥的充分混淆和扩散。
module key_expand (
input clk,
input rst_n,
input load_mkey,
input [127:0] mkey,
output [1023:0] rk_flatten,
output reg busy
);
localparam FK = 128'ha3b1bac656aa3350677d9197b27022dc;
reg [4:0] round_counter;
reg [31:0] K0, K1, K2, K3;
reg [31:0] round_keys [0:31];
wire [31:0] cki;
wire [31:0] round_dout;
wire start_expand = load_mkey & ~busy;
integer i;
assign rk_flatten = {round_keys[0], round_keys[1], round_keys[2], round_keys[3],
round_keys[4], round_keys[5], round_keys[6], round_keys[7],
round_keys[8], round_keys[9], round_keys[10], round_keys[11],
round_keys[12], round_keys[13], round_keys[14], round_keys[15],
round_keys[16], round_keys[17], round_keys[18], round_keys[19],
round_keys[20], round_keys[21], round_keys[22], round_keys[23],
round_keys[24], round_keys[25], round_keys[26], round_keys[27],
round_keys[28], round_keys[29], round_keys[30], round_keys[31]};
always @(posedge clk, negedge rst_n) begin
if (~rst_n) {K0, K1, K2, K3} <= 128'h0;
else begin
if (start_expand) {K0, K1, K2, K3} <= mkey ^ FK;
else if (busy) {K0, K1, K2, K3} <= {K1, K2, K3, round_dout};
end
end
always @(posedge clk) begin
if (busy) begin
round_keys[31] <= round_dout;
for (i = 30; i >= 0; i = i - 1) begin
round_keys[i] <= round_keys[i+1];
end
end
end
always @(posedge clk, negedge rst_n) begin
if (~rst_n) round_counter <= 5'd0;
else begin
if (busy) round_counter <= round_counter + 5'd1;
else if (start_expand) round_counter <= 5'd0;
end
end
always @(posedge clk, negedge rst_n) begin
if (~rst_n) busy <= 1'b0;
else begin
if (start_expand) busy <= 1'b1;
else if (round_counter == 5'd31) busy <= 1'b0;
end
end
key_expand_cki u_ck (.round(round_counter), .cki(cki));
key_expand_round u_round (.din({K0, K1, K2, K3}), .cki(cki), .dout(round_dout));
endmodule
3. encrypt模块
encrypt模块实现SM4的32轮加密流水线。它接收plaintext输入和rk_flatten轮密钥,输出ciphertext密文。模块内部包含32个encrypt_round实例,形成完整的处理流水线。
该模块使用round_ctrl移位寄存器跟踪数据在流水线中的进度。X0_3数组存储每轮的中间状态,32轮完成后通过重排列生成最终密文。dout_valid信号在第32个周期后置位,表示输出有效。
流水线控制逻辑确保新数据输入时能正确初始化加密过程。busy信号综合了din_valid和round_ctrl状态,准确反映模块工作状态。这种设计允许背靠背的数据输入,最大化吞吐量。
module encrypt (
input clk,
input rst_n,
input din_valid,
input [127:0] plaintext,
output [127:0] ciphertext,
output dout_valid,
output busy,
input [1023:0] rk_flatten
);
reg [32:0] round_ctrl;
reg [127:0] X0_3 [0:32];
wire [31:0] round_keys [0:31];
wire [127:0] round_out [0:31];
integer i;
genvar j;
assign dout_valid = round_ctrl[32];
assign busy = din_valid | (|round_ctrl);
assign ciphertext = {X0_3[32][31:0], X0_3[32][63:32], X0_3[32][95:64], X0_3[32][127:96]};
assign {
round_keys[0], round_keys[1], round_keys[2], round_keys[3],
round_keys[4], round_keys[5], round_keys[6], round_keys[7],
round_keys[8], round_keys[9], round_keys[10], round_keys[11],
round_keys[12], round_keys[13], round_keys[14], round_keys[15],
round_keys[16], round_keys[17], round_keys[18], round_keys[19],
round_keys[20], round_keys[21], round_keys[22], round_keys[23],
round_keys[24], round_keys[25], round_keys[26], round_keys[27],
round_keys[28], round_keys[29], round_keys[30], round_keys[31]
} = rk_flatten;
always @(posedge clk, negedge rst_n) begin
if (~rst_n) begin
for (i = 0; i < 33; i = i + 1) begin
X0_3[i] <= 128'h0;
end
end else begin
if (din_valid) begin
X0_3[0] <= plaintext;
end
for (i = 1; i < 33; i = i + 1) begin
X0_3[i] <= round_out[i-1];
end
end
end
always @(posedge clk, negedge rst_n) begin
if (~rst_n) begin
round_ctrl <= 33'd0;
end else begin
round_ctrl <= {round_ctrl[32:0], din_valid};
end
end
generate
for (j = 0; j < 32; j = j + 1) begin : enr_instances
encrypt_round u_er (
.din (X0_3[j]),
.rki (round_keys[j]),
.dout(round_out[j])
);
end
endgenerate
endmodule
4. encrypt_round模块
encrypt_round实现SM4的轮函数变换,模块接收128位输入数据和32位轮密钥,输出128位变换结果。核心操作包括32位异或、S盒替换和L变换。变换过程首先将输入数据与轮密钥组合,然后通过4个并行S盒进行字节替换。替换结果经过L变换(循环移位和异或)后,与原始数据混合生成输出。
module encrypt_round (
input [127:0] din,
input [31:0] rki,
output [127:0] dout
);
wire [31:0] word_0, word_1, word_2, word_3;
wire [31:0] transform_din;
wire [31:0] transform_dout;
wire [7:0] sbox_bin0, sbox_bin1, sbox_bin2, sbox_bin3;
wire [7:0] sbox_bout0, sbox_bout1, sbox_bout2, sbox_bout3;
wire [31:0] sbox_wout = {sbox_bout0, sbox_bout1, sbox_bout2, sbox_bout3};
assign {word_0, word_1, word_2, word_3} = din;
assign transform_din = word_1 ^ word_2 ^ word_3 ^ rki;
assign {sbox_bin0, sbox_bin1, sbox_bin2, sbox_bin3} = transform_din;
assign transform_dout = ((sbox_wout ^ {sbox_wout[29:0], sbox_wout[31:30]}) ^ ({sbox_wout[21:0], sbox_wout[31:22]}
^ {sbox_wout[13:0], sbox_wout[31:14]})) ^ {sbox_wout[7:0], sbox_wout[31:8]};
assign dout = {word_1, word_2, word_3, transform_dout ^ word_0};
sm4_sbox sm4_sbox0 (.s_in (sbox_bin0), .s_out(sbox_bout0));
sm4_sbox sm4_sbox1 (.s_in (sbox_bin1), .s_out(sbox_bout1));
sm4_sbox sm4_sbox2 (.s_in (sbox_bin2), .s_out(sbox_bout2));
sm4_sbox sm4_sbox3 (.s_in (sbox_bin3), .s_out(sbox_bout3));
endmodule
5. 其他模块
key_expand_cki模块提供密钥扩展所需的32轮常数cki,采用查找表方式实现。key_expand_round模块实现密钥扩展的轮函数,结构与encrypt_round类似但线性变换不同。sm4_sbox模块实现SM4的S盒替换,使用256字节的查找表实现非线性变换。
module key_expand_cki(
input [4:0] round,
output reg [31:0] cki
);
always@(*)
case(round)
5'h00: cki <= 32'h00070e15;
5'h01: cki <= 32'h1c232a31;
5'h02: cki <= 32'h383f464d;
5'h03: cki <= 32'h545b6269;
5'h04: cki <= 32'h70777e85;
5'h05: cki <= 32'h8c939aa1;
5'h06: cki <= 32'ha8afb6bd;
5'h07: cki <= 32'hc4cbd2d9;
5'h08: cki <= 32'he0e7eef5;
5'h09: cki <= 32'hfc030a11;
5'h0a: cki <= 32'h181f262d;
5'h0b: cki <= 32'h343b4249;
5'h0c: cki <= 32'h50575e65;
5'h0d: cki <= 32'h6c737a81;
5'h0e: cki <= 32'h888f969d;
5'h0f: cki <= 32'ha4abb2b9;
5'h10: cki <= 32'hc0c7ced5;
5'h11: cki <= 32'hdce3eaf1;
5'h12: cki <= 32'hf8ff060d;
5'h13: cki <= 32'h141b2229;
5'h14: cki <= 32'h30373e45;
5'h15: cki <= 32'h4c535a61;
5'h16: cki <= 32'h686f767d;
5'h17: cki <= 32'h848b9299;
5'h18: cki <= 32'ha0a7aeb5;
5'h19: cki <= 32'hbcc3cad1;
5'h1a: cki <= 32'hd8dfe6ed;
5'h1b: cki <= 32'hf4fb0209;
5'h1c: cki <= 32'h10171e25;
5'h1d: cki <= 32'h2c333a41;
5'h1e: cki <= 32'h484f565d;
5'h1f: cki <= 32'h646b7279;
default: cki <= 32'h0;
endcase
endmodule
module key_expand_round (
input [127:0] din,
input [ 31:0] cki,
output [ 31:0] dout
);
wire [31:0] word_0, word_1, word_2, word_3;
wire [31:0] transform_din;
wire [31:0] transform_dout;
wire [7:0] sbox_bin0, sbox_bin1, sbox_bin2, sbox_bin3;
wire [7:0] sbox_bout0, sbox_bout1, sbox_bout2, sbox_bout3;
wire [31:0] sbox_wout = {sbox_bout0, sbox_bout1, sbox_bout2, sbox_bout3};
assign {word_0, word_1, word_2, word_3} = din;
assign transform_din = word_1 ^ word_2 ^ word_3 ^ cki;
assign {sbox_bin0, sbox_bin1, sbox_bin2, sbox_bin3} = transform_din;
assign transform_dout = (sbox_wout^{sbox_wout[18:0],sbox_wout[31:19]})^{sbox_wout[8:0],sbox_wout[31:9]};
assign dout = transform_dout ^ word_0;
sm4_sbox sm4_sbox0 (.s_in (sbox_bin0), .s_out(sbox_bout0));
sm4_sbox sm4_sbox1 (.s_in (sbox_bin1), .s_out(sbox_bout1));
sm4_sbox sm4_sbox2 (.s_in (sbox_bin2), .s_out(sbox_bout2));
sm4_sbox sm4_sbox3 (.s_in (sbox_bin3), .s_out(sbox_bout3));
endmodule
module sm4_sbox(
input [7:0] s_in,
output [7:0] s_out
);
reg [7:0] sbox[0:255];
initial
begin
sbox[000]=8'hd6; sbox[001]=8'h90; sbox[002]=8'he9; sbox[003]=8'hfe; sbox[004]=8'hcc; sbox[005]=8'he1; sbox[006]=8'h3d; sbox[007]=8'hb7;
sbox[008]=8'h16; sbox[009]=8'hb6; sbox[010]=8'h14; sbox[011]=8'hc2; sbox[012]=8'h28; sbox[013]=8'hfb; sbox[014]=8'h2c; sbox[015]=8'h05;
sbox[016]=8'h2b; sbox[017]=8'h67; sbox[018]=8'h9a; sbox[019]=8'h76; sbox[020]=8'h2a; sbox[021]=8'hbe; sbox[022]=8'h04; sbox[023]=8'hc3;
sbox[024]=8'haa; sbox[025]=8'h44; sbox[026]=8'h13; sbox[027]=8'h26; sbox[028]=8'h49; sbox[029]=8'h86; sbox[030]=8'h06; sbox[031]=8'h99;
sbox[032]=8'h9c; sbox[033]=8'h42; sbox[034]=8'h50; sbox[035]=8'hf4; sbox[036]=8'h91; sbox[037]=8'hef; sbox[038]=8'h98; sbox[039]=8'h7a;
sbox[040]=8'h33; sbox[041]=8'h54; sbox[042]=8'h0b; sbox[043]=8'h43; sbox[044]=8'hed; sbox[045]=8'hcf; sbox[046]=8'hac; sbox[047]=8'h62;
sbox[048]=8'he4; sbox[049]=8'hb3; sbox[050]=8'h1c; sbox[051]=8'ha9; sbox[052]=8'hc9; sbox[053]=8'h08; sbox[054]=8'he8; sbox[055]=8'h95;
sbox[056]=8'h80; sbox[057]=8'hdf; sbox[058]=8'h94; sbox[059]=8'hfa; sbox[060]=8'h75; sbox[061]=8'h8f; sbox[062]=8'h3f; sbox[063]=8'ha6;
sbox[064]=8'h47; sbox[065]=8'h07; sbox[066]=8'ha7; sbox[067]=8'hfc; sbox[068]=8'hf3; sbox[069]=8'h73; sbox[070]=8'h17; sbox[071]=8'hba;
sbox[072]=8'h83; sbox[073]=8'h59; sbox[074]=8'h3c; sbox[075]=8'h19; sbox[076]=8'he6; sbox[077]=8'h85; sbox[078]=8'h4f; sbox[079]=8'ha8;
sbox[080]=8'h68; sbox[081]=8'h6b; sbox[082]=8'h81; sbox[083]=8'hb2; sbox[084]=8'h71; sbox[085]=8'h64; sbox[086]=8'hda; sbox[087]=8'h8b;
sbox[088]=8'hf8; sbox[089]=8'heb; sbox[090]=8'h0f; sbox[091]=8'h4b; sbox[092]=8'h70; sbox[093]=8'h56; sbox[094]=8'h9d; sbox[095]=8'h35;
sbox[096]=8'h1e; sbox[097]=8'h24; sbox[098]=8'h0e; sbox[099]=8'h5e; sbox[100]=8'h63; sbox[101]=8'h58; sbox[102]=8'hd1; sbox[103]=8'ha2;
sbox[104]=8'h25; sbox[105]=8'h22; sbox[106]=8'h7c; sbox[107]=8'h3b; sbox[108]=8'h01; sbox[109]=8'h21; sbox[110]=8'h78; sbox[111]=8'h87;
sbox[112]=8'hd4; sbox[113]=8'h00; sbox[114]=8'h46; sbox[115]=8'h57; sbox[116]=8'h9f; sbox[117]=8'hd3; sbox[118]=8'h27; sbox[119]=8'h52;
sbox[120]=8'h4c; sbox[121]=8'h36; sbox[122]=8'h02; sbox[123]=8'he7; sbox[124]=8'ha0; sbox[125]=8'hc4; sbox[126]=8'hc8; sbox[127]=8'h9e;
sbox[128]=8'hea; sbox[129]=8'hbf; sbox[130]=8'h8a; sbox[131]=8'hd2; sbox[132]=8'h40; sbox[133]=8'hc7; sbox[134]=8'h38; sbox[135]=8'hb5;
sbox[136]=8'ha3; sbox[137]=8'hf7; sbox[138]=8'hf2; sbox[139]=8'hce; sbox[140]=8'hf9; sbox[141]=8'h61; sbox[142]=8'h15; sbox[143]=8'ha1;
sbox[144]=8'he0; sbox[145]=8'hae; sbox[146]=8'h5d; sbox[147]=8'ha4; sbox[148]=8'h9b; sbox[149]=8'h34; sbox[150]=8'h1a; sbox[151]=8'h55;
sbox[152]=8'had; sbox[153]=8'h93; sbox[154]=8'h32; sbox[155]=8'h30; sbox[156]=8'hf5; sbox[157]=8'h8c; sbox[158]=8'hb1; sbox[159]=8'he3;
sbox[160]=8'h1d; sbox[161]=8'hf6; sbox[162]=8'he2; sbox[163]=8'h2e; sbox[164]=8'h82; sbox[165]=8'h66; sbox[166]=8'hca; sbox[167]=8'h60;
sbox[168]=8'hc0; sbox[169]=8'h29; sbox[170]=8'h23; sbox[171]=8'hab; sbox[172]=8'h0d; sbox[173]=8'h53; sbox[174]=8'h4e; sbox[175]=8'h6f;
sbox[176]=8'hd5; sbox[177]=8'hdb; sbox[178]=8'h37; sbox[179]=8'h45; sbox[180]=8'hde; sbox[181]=8'hfd; sbox[182]=8'h8e; sbox[183]=8'h2f;
sbox[184]=8'h03; sbox[185]=8'hff; sbox[186]=8'h6a; sbox[187]=8'h72; sbox[188]=8'h6d; sbox[189]=8'h6c; sbox[190]=8'h5b; sbox[191]=8'h51;
sbox[192]=8'h8d; sbox[193]=8'h1b; sbox[194]=8'haf; sbox[195]=8'h92; sbox[196]=8'hbb; sbox[197]=8'hdd; sbox[198]=8'hbc; sbox[199]=8'h7f;
sbox[200]=8'h11; sbox[201]=8'hd9; sbox[202]=8'h5c; sbox[203]=8'h41; sbox[204]=8'h1f; sbox[205]=8'h10; sbox[206]=8'h5a; sbox[207]=8'hd8;
sbox[208]=8'h0a; sbox[209]=8'hc1; sbox[210]=8'h31; sbox[211]=8'h88; sbox[212]=8'ha5; sbox[213]=8'hcd; sbox[214]=8'h7b; sbox[215]=8'hbd;
sbox[216]=8'h2d; sbox[217]=8'h74; sbox[218]=8'hd0; sbox[219]=8'h12; sbox[220]=8'hb8; sbox[221]=8'he5; sbox[222]=8'hb4; sbox[223]=8'hb0;
sbox[224]=8'h89; sbox[225]=8'h69; sbox[226]=8'h97; sbox[227]=8'h4a; sbox[228]=8'h0c; sbox[229]=8'h96; sbox[230]=8'h77; sbox[231]=8'h7e;
sbox[232]=8'h65; sbox[233]=8'hb9; sbox[234]=8'hf1; sbox[235]=8'h09; sbox[236]=8'hc5; sbox[237]=8'h6e; sbox[238]=8'hc6; sbox[239]=8'h84;
sbox[240]=8'h18; sbox[241]=8'hf0; sbox[242]=8'h7d; sbox[243]=8'hec; sbox[244]=8'h3a; sbox[245]=8'hdc; sbox[246]=8'h4d; sbox[247]=8'h20;
sbox[248]=8'h79; sbox[249]=8'hee; sbox[250]=8'h5f; sbox[251]=8'h3e; sbox[252]=8'hd7; sbox[253]=8'hcb; sbox[254]=8'h39; sbox[255]=8'h48;
end
assign s_out=sbox[s_in];
endmodule
三、实验结果
使用iverilog进行快速功能验证,测试了10组明文/密文对,Makefile文件、testbench和测试结果如下。所有测试用例均通过,实际输出与预期密文完全一致。测试平台自动比较结果并显示通过/失败信息,验证了设计的正确性。VCD波形文件被成功生成,便于后续分析。
IVERILOG := iverilog
VVP := vvp
GTKWAVE := gtkwave
SRC := $(wildcard ./*.v)
VCD := sm4_top.vcd
TARGET := sim
IVERILOG_FLAGS := -g2012 -Wall -Wno-timescale
all: compile run wave
compile: $(SRC)
@echo "[IVERILOG] Compiling sources..."
@$(IVERILOG) $(IVERILOG_FLAGS) -o $(TARGET) $(SRC) || (echo "Compilation failed"; exit 1)
run: compile
@echo "[VVP] Running simulation..."
@$(VVP) $(TARGET)
@echo "Simulation completed."
wave:
@echo "[GTKWAVE] Opening waveforms..."
@$(GTKWAVE) $(VCD) --autosavename &
.PHONY: all compile run wave
`timescale 1ns/1ps
module sm4_top_tb;
reg clk=0;
reg rst_n=0;
reg [127:0] mkey=0;
reg load_mkey=0;
reg [127:0] plaintext=0;
reg din_valid=0;
wire [127:0] ciphertext;
wire dout_valid;
wire busy;
sm4_top uut (
.clk(clk),
.rst_n(rst_n),
.mkey(mkey),
.load_mkey(load_mkey),
.plaintext(plaintext),
.din_valid(din_valid),
.ciphertext(ciphertext),
.dout_valid(dout_valid),
.busy(busy)
);
always #5 clk = ~clk;
reg [127:0] test_plaintexts [0:9];
reg [127:0] expected_ciphertexts [0:9];
initial begin
test_plaintexts[0]=128'h0123456789abcdeffedcba9876543210;
test_plaintexts[1]=128'h19dfd145a155ba9582618728cec3129b;
test_plaintexts[2]=128'h5ea6ab0e8c952e165b5cb8770cc68454;
test_plaintexts[3]=128'h217da38edffa0a313bae2de200c2f0a4;
test_plaintexts[4]=128'h9b90f75138905a2455536f8e8c7c48bb;
test_plaintexts[5]=128'h2b393de18384c3908814a72bd9082802;
test_plaintexts[6]=128'h20b68d21653ae1e63e1f4186a8b38971;
test_plaintexts[7]=128'h50bbcc6daca27a2beaeed62752fefcab;
test_plaintexts[8]=128'hba030f96f7d880675c0888e2c286aa07;
test_plaintexts[9]=128'h744312ac78eab65501985ef67532d86b;
expected_ciphertexts[0]=128'h681edf34d206965e86b3e94f536e4246;
expected_ciphertexts[1]=128'h4f4bb97495eda50ee3d4773f8a70961b;
expected_ciphertexts[2]=128'h0c18de048cf8ad1a136b32426539fbd8;
expected_ciphertexts[3]=128'hba6b80da7ab003b8ec1a65b6e44e50aa;
expected_ciphertexts[4]=128'hf606e5dacd97c4bb6cdb5c51a210a4e2;
expected_ciphertexts[5]=128'h86637413cc9695b38d7fddd2c8b3682b;
expected_ciphertexts[6]=128'hbfbf1d47b7956bc2564d79b59d08cdbc;
expected_ciphertexts[7]=128'hb86d60aa7ad3047aee3a75348e011e49;
expected_ciphertexts[8]=128'h7f75667ecf8f1079337d70643c0e74a5;
expected_ciphertexts[9]=128'h0e3a9246a1b0ad477b5d0c33ff72ca40;
end
integer i = 0;
initial begin
#15 rst_n = 1;
mkey = 128'h0123456789abcdeffedcba9876543210;
load_mkey = 1;
@(negedge clk);
load_mkey = 0;
wait(busy == 0);
@(negedge clk);
#20 plaintext=test_plaintexts[0]; din_valid = 1;
#10 din_valid = 0;
#20 plaintext = test_plaintexts[1]; din_valid = 1;
#10 plaintext = test_plaintexts[2];
#10 plaintext = test_plaintexts[3];
#10 plaintext = test_plaintexts[4];
#10 din_valid = 0;
#30 plaintext = test_plaintexts[5]; din_valid = 1;
#10 plaintext = test_plaintexts[6];
#10 plaintext = test_plaintexts[7];
#10 plaintext = test_plaintexts[8];
#10 plaintext = test_plaintexts[9];
#10 din_valid = 0;
wait(busy == 0);
#100 $finish;
end
always @(posedge clk) begin
if (dout_valid) begin
if (ciphertext === expected_ciphertexts[i]) begin
$display("Test %0d: Passed, Expected %h, Actual %h", i, expected_ciphertexts[i], ciphertext);
end else begin
$display("Test %0d: Failed, Expected %h, Actual %h", i, expected_ciphertexts[i], ciphertext);
end
i = i + 1;
end
end
initial begin
$dumpfile("sm4_top.vcd");
$dumpvars(0, sm4_top_tb);
end
endmodule
在gtkwave和Modelsim中观察仿真波形,可以清晰看到流水线的工作过程。当din_valid有效时,明文进入流水线,经过11个周期后dout_valid变高,输出有效密文。busy信号准确反映了系统状态,密钥扩展和加密过程没有重叠时的控制信号行为符合预期。
用Vivado(XC7A35T-1CSG324C)进行综合,结果如下:
四、总结
本文介绍了SM4分组密码算法的Verilog流水线实现方案。SM4作为中国国家标准密码算法,采用32轮非线性迭代结构,本设计通过全展开流水线技术实现高性能硬件加密。系统分为密钥扩展和加密处理两大模块,其中密钥扩展模块预先计算32轮子密钥,加密模块则通过32级流水线并行处理数据。Verilog代码采用层次化设计,包括顶层控制、密钥扩展、加密轮函数和S盒等子模块,通过状态信号协调流水线运作。实验验证表明,该设计功能正确,能高效处理加密任务,在标准测试向量下全部通过验证。