0% found this document useful (0 votes)

75 views

A L D I S HW/SW C - D: Shun-Wen Cheng

The document proposes a multi-level hardware/software co-design approach for sorting arbitrary long digit integers. A fixed-digit hardware sorter implements the first level of sorting. A software programmed radix sort using a radix of 2k implements the second level of sorting, where k is the digit width of the hardware sorter. This allows the fixed hardware sorter to be reused for sorting longer integers in combination with the software sort, improving flexibility and cost-performance over a pure hardware approach.

Uploaded by

morvarid7980

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views

A L D I S HW/SW C - D: Shun-Wen Cheng

Uploaded by

morvarid7980

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

ARBITRARY LONG DIGIT INTEGER SORTER HW/SW CO-DESIGN

Shun-Wen Cheng
Tamkang University Taipei, TAIWAN E-mail: [email protected] Abstract The coming of multimedia era and information
security era indicates that must process longer digit integer data. Previous sort researches focus on pure performance of large amount of finite fixed digit/bit number. This paper discusses on effectively solving arbitrary long digit integer sorting problem by HW/SW co-design under the AreaTime2 (AT2) price-performance constraint. The work proposes multi-level (two-level) sort architecture to attain the object: an accomplished fixed-digit (k-bit) hardware sorter implements the first or basic level sorting, software programmed radix 2k sort implements the second or higher level sorting. By Super Radix Sorting HW/SW co-design and reuse techniques, the work makes fixed-digit HW sorters more flexible and useful.

a b

m a x ( a , b) min(a, b)

a b
a

b b

a b

a b
a

b b

b a

a b

min(a, b) m a x ( a , b)

a b
a

a b

a b
a

b a

Figure 1. Compare & swap elements are vital for sorting

AB A=B

Index Term HW/SW co-design, Reusable & Embedded

Cores, Sorting, Radix Sort, Technology Independent Methodologies, System-on-a-Chip (SoC), VLSI design.

A>B

(a) 1-bit magnitude comparator.

(b) 2-bit comparator.

B 12-15 A 12-15 B 8-11 A 8-11
A<B

1. INTRODUCTION
Sorting is one of the most important problems in computer science. Many fundamental processes in computing and communication systems require sorting of data. Sorting network play a key role in the areas of parallel computing, multi-access memories and multiprocessing [3], [4], [5], [6], [11], [13], [14], [19]. Compare and swap elements of data are vital for sorting, as depicted in Fig. 1. But if someone needs to process very long digit integer sorting, then directly design a corresponding digit integer hardware sorter, the comparators and networks will become very huge. The circuit schematics of 1, 2, 4, and 16 bit magnitude comparators are depicted in Fig. 2. And about bus, if it is designed for 32-digit integer, every bus represents 32-bit line. And if it is designed for 64-digit integer, every bus represents 64-bit line. That means it needs double wire structures and areas. More importantly, circuit cost/complexity of a (2k)bit comparator are not only twice than a k-bit comparator, as shown in Table 1. Also, the ability of CMOS circuit fan-out is limited; it still needs to add some additional buffers in the comparator circuits.
A3 B3 A2 B2 A1 B1 A0 B0 A=B

AB AB AB AB A=B A>B

B 4-7 A 4-7
A>B

B 0-3 A 0-3

(c) 4-bit comparator.

(d) 16-bit comparator.

Figure 2. The Circuit of magnitude comparators.

magnitude 1-bit 2-bit 4-bit 16-bit comparator CMOS Cost (gate count) 12 P + 12 N 39 P + 39 N 87 P + 87 N 399P + 399N

Table 1. Circuit cost/complexity of a long bit/digit comparator are more higher than a short bit/digit one.

m number single sorter chip design 16-bit Enumeration Sorter [23] (1982) 16-bit VLSI Sorter [16] (1983) 16-bit Rebound Sorter [8] (1978 [8], 1989 [2])

Function blocks of one-sorter cell Two 16-bit data registers One 16-bit comparator One 8-bit counter Two 16-bit data registers One 16-bit comparator Two 16-bit 2-way multiplexers Two 8-bit data registers One 8-bit comparator Two 8-bit 2-way multiplexers One 1-bit comparator One 16-bit shift register Two 1-bit 2-way multiplexers Two 1-bit delay elements

Cost of each functional block (CMOS transistor gate count) 256P + 256N 399P + 399N 136P + 136N 256P + 256N 399P + 399N 64P + 64N 128P + 128N 195P + 195N 32P + 32N 12P + 12N 452P + 452N 4P + 4N 2P + 2N

Number of cells

Number of clock cycles

m/2

16-bit Bit-Serial Sorter [1] (1991)

N+1

Table 2. Chip Comparison of m 16-bit hardware sorter designs.

In Table 2, some sorter chip designs had shown hardware expandable properties [1], [2], [8]. But they are not good enough for arbitrary long digit integer sorter design. The time performance of a fixed-digit (k-bit) hardware sorter is often better than a same digit software sort program, as displayed in Table 3. But a pure hardware sorter still has higher area cost and some restrictions, so it is not popular yet on common commercial CPUs. Base on the physical considerations, the author focuses on effectively solving arbitrary long digit integer sort problem by HW/SW co-design under Area-Time2 (AT2) cost-performance trade-off constraint [20], [21]. Several AT2-optimal sorting networks under different word length models have been proposed in [7], [9], [15], and [17]. For embedded systems, a uniprocessor software solution is often not applicable due to the insufficient I/O and performance, while realizing multiprocessor sorting methods on parallel computers is much too expensive with respect to area cost and power consumption. When the trends of data processing migrate from 32bit to 64-bit, 128-bit or uncertainly higher, a fixed-digit pure HW sorter cannot content demands alone. All of the sorting algorithms or circuits in this paper are based on commonly known algorithms and structures. But make an accomplished hardware sorter reusable [12], make a pure HW sorter more flexible and balance its cost-performance, are very valuable and necessary. This paper is organized as follows. Section 2 briefly introduces the basic LSD radix sort algorithm. Then a cost-benefit balanced multi-level (two-level) HW/SW mixed sort architecture is given and discussed in Section 3. Finally conclude the major findings and outline the future work.

Design Uniprocessor Heapsort (1 + log 2 N) processor Mergesort (log 2 N) 2 processor Bitonic Sort Nprocessor Bitonic Sort on Mesh Nprocessor Bitonic Sort on Shuffle-Exchange Net [19] N (log 2 N)2 Comparator Bitonic Sort N2 comparators Bubble Sort

Area Perf. (A) log N (log 2 N)2 (log 2 N)3 N (log 2 N)2 N2 / (log 2 N)2 N2 / log 2 N N log 2 N

Time Perf. (Td) N (log 2 N)2 N log 2 N N sqrt(N) (log 2 N)3 (log 2 N)2 N

Table 3. Area-Time Bounds for the finite and fixed bit/digit number sorting problem [21].

II. STRAIGHT RADIX SORT ALGORITHM This approach begins with the least significant key first, and is known as LSD (Least Significant Digit) sort. Following the sort on a key, the piles are put together to obtain a single pile that is then sorted on the next significant key. This process is continued until the pile is sorted on the most significant key [13]. And the sorted sequence is obtained. Complexity As shown in Fig. 3, it takes n steps to put all the elements in queue AUX, and d steps to initialize the queues Q[i]. The main loop of the algorithm, which is executed m times, pops each element from AUX and pushes it into one of the Q[i]s. It also concatenates all the Q[i]s together. So the overall running time of the algorithm is O(m n). But if m is limited or small, it can be ignored. So the time complexity of the algorithm is O(n), this is a common condition under a common CPU.

Algorithm Straight_Radix_Sort (A[ ], n, k) (* Input: A[ ](an array of integer, each with k digits, in the rage 1 to n). Output: A[ ] ( the array in sorted order). *) begin Assume that all elements are initially in a auxiliary queue AUX; (* The use of AUX is for simplicity; it can be implemented by Array A *) for i:= 1 to d do (* d is the possible digits; d = 10 in case of decimal numbers *) Initialize queue Q[i] to be empty; for i:= k downto d do while AUX is not empty do Pop x from AUX; d := the i-th digit of x; Insert x into Q[d]; for j:= 1 to d do Insert Q[j] into AUX; for i:= 1 to n do Pop A[i] from AUX; end.
Figure 3. Basic straight radix sort algorithm and a radix-10 sorting example.
A Radix -10 Sorting Example: 232, 321, 213, 231, 111, 112, 132, 123, 221 1S 321, 231, 111, 221 2S 232, 112, 132 3S 213, 123 321, 231, 111, 221, 232, 112, 132, 213, 123 10S 111, 112, 213 20S 321, 221, 123 30S 231, 232, 132 111, 112, 213, 321, 221, 123, 231, 232, 132 100S 111, 112, 123, 132 200S 213, 221, 231, 232 300S 321 Result: 111, 112, 123, 132, 213, 221, 231, 232, 321

III. A MULTI-LEVEL MIXED ARCHITECTURE: SUPER RADIX SORT

Figure 3 also displays an LSD radix-10 sorting example using linked allocation [9]. But when the radix is very large, linked list allocation will become ineffective. From this example, the benefits of LSD radix sort are directly unfolded: (1) the key size can be changed easily; (2) there is no recursive function call, no stack size problem. For solving arbitrary long digit integer sorting problem under cost-performance trade-off constraint, the LSD radix benefits will be extended to the utmost edge. And because of very long digit integer, using bit field structure to reduce memory requirement, and accelerate sort process, is necessary. As depicted in Fig. 4, a twolevel HW/SW mixed sort architecture are proposed: an accomplished fixed-digit (k-bit) hardware sorter implements the first/bottom level sorting, software programmed LSD-radix (radix 2k) sort implements the second/higher level sorting by way of CPU. Thus sort operation will appear in assembly codes, as Fig. 5 shows. It can directly handle maximum 232 k-digit integers sorting job (if 32 is the length of common register). If k=16, it can handle max 232 16 digit integer sorting job. If the number of digit is still higher then the quota, similar

multi-level mixed sort architecture can be considered. Of course, if the input sequence is also arbitrary long, some special design have provided solutions [24]. Or the sequence is separated into several pieces, and then merges them to get the total result after sorting. Because the bit length of numbers is very long, compare two numbers than directly swap them is very ineffective [18]. An indirect method -- only record swapped indices and hold them in cache is a good idea. If the system only has an common CPU and the bits of the longest number is m, and the sort algorithm is radix sort, the average overall running time of the proposed method is m O(N). But if the system has an accomplished fixed-digit (k-bit) hardware sorter on the system and the bits of the longest number is m, the overall running time of the proposed method becomes m / k Td. If the HW sorter is N (log 2 N)2 Comparator Bitonic Sorter, the overall running time is m / k O( (log 2 N)2 ). Some comparisons are shown in Table 4. The proposed HW/SW mixed super radix sorting architecture can process and change HW/SW partitioning ratio easily, as displayed in Fig. 6, to get a cost-benefit balanced flexible HW/SW mixed design. And the accomplished fixed-digit (k-bit) hardware sorter can choose any your favor or your own design.

32 bits int int int int int int int 1 2 3 4 5 6 7

32 bits Main Memory / Virtual Memory Space

DATA SEGMENT NUM DB ............. ............... DATA ENDS .......................... CODE SEGMENT .......................... SORT_START: MOV ESI, OFFSET NUM ; source address MOV EBX, 88D ; digit of number = 88 MOV EDX, 9D ; there are 9 numbers to be sorted SORT ......................... CODE ENDS

load data High Speed System Bus control C P U [88 /32 3] (CX=3) control TAG Cache load data k-bit (32-bit) HW Sorter

Figure 5. Why does not instruction SORT appear in instruction sets of nowadays?
Increasing Software Sorting Processing (Super Radix Sort)

(a) Sort step 1.

32 bits int int int int int int int 1 2 3 4 5 6 7 load data High Speed System Bus control control C P U (CX=2) TAG Cache load data k-bit (32-bit) HW Sorter 32 bits Main Memory / Virtual Memory Space

SW HW

SW HW HW

HW Increasing Hardware Sorter Common CPU 16-bit HW 32-bit HW 64-bit HW Pure ASIC

Slow

Sort Speed (Performance)

Big CPU + Light HW Sorter

(b) Sort step 2.

32 bits int int int int int int int 1 2 3 4 5 6 7 load data High Speed System Bus control control C P U (CX=1) TAG Cache load data k-bit (32-bit) HW Sorter 32 bits Main Memory / Virtual Memory Space

Fast

Small CPU + Powerful HW Sorter Small Total Area (Cost) Large

Figure 6. Impact and challenge of hardware/software co-design trade-off.

Figure 7 demonstrates a super radix sort: 88-digit integer SW LSD radix-4,294,967,296 (radix-232 ) sort with 32-bit HW sorter mixed sorting, it needs 3 steps. And it is processed by 88-digit integer SW LSD Radix-65,536 (Radix-216 ) sort with 16-bit HW sorter mixed sorting, it will needs 6 steps. If the hardware sorter can be easily decomposed to several stages then pipeline, the hardware sorter can get more higher hardware sharing and throughputs, as Fig. 8 depicts.

Tag[X] num[X][2] num[X][1] num[X][0] Origin: [1] 00000110 0100000010010000 1010000000010100 0010101000000000 0001000000101000 0001010101010000 [2] 10101000 0010101010101000 0000101010001010 0100000010101011 0000000000000010 0000000000000000 [3] 00000000 0000010111110000 0000000000000000 0000000010101010 0000010101000000 0101100000001100 [4] 00000011 0111100000000100 1100000011110000 0001110000010010 0000000000000000 0000000000000010 [5] 00000000 0000000000000000 0000000000000000 0000010100001000 0000000000000000 0000001100110011 [6] 00000000 0000000000000000 0000000000000001 0000100010000000 0000000000000100 0010100000010000 [7] 00000000 0000000000000001 0001010000001000 0000000100101000 0000000101000000 0000000000111000 [8] 00000000 0000000000000000 0000000100100000 0000001000010100 0010001000101000 0000001010000100 [9] 00000000 0001000010101000 0000100001110000 0000100000011000 0000000000001111 0000000001100000 Step 1: [4] [5] [2] [6] [9] [7] [3] [1] [8] Step 2: [3] [5] [6] [8] [9] [2] [7] [1] [4] Step 3: [5] [6] [8] [7] [3] [9] [4] [1] [2] 00000011 0111100000000100 1100000011110000 0001110000010010 0000000000000000 0000000000000010 00000000 0000000000000000 0000000000000000 0000010100001000 0000000000000000 0000001100110011 10101000 0010101010101000 0000101010001010 0100000010101011 0000000000000010 0000000000000000 00000000 0000000000000000 0000000000000001 0000100010000000 0000000000000100 0010100000010000 00000000 0001000010101000 0000100001110000 0000100000011000 0000000000001111 0000000001100000 00000000 0000000000000001 0001010000001000 0000000100101000 0000000101000000 0000000000111000 00000000 0000010111110000 0000000000000000 0000000010101010 0000010101000000 0101100000001100 00000110 0100000010010000 1010000000010100 0010101000000000 0001000000101000 0001010101010000 00000000 0000000000000000 0000000100100000 0000001000010100 0010001000101000 0000001010000100 00000000 0000010111110000 0000000000000000 0000000010101010 0000010101000000 0101100000001100 00000000 0000000000000000 0000000000000000 0000010100001000 0000000000000000 0000001100110011 00000000 0000000000000000 0000000000000001 0000100010000000 0000000000000100 0010100000010000 00000000 0000000000000000 0000000100100000 0000001000010100 0010001000101000 0000001010000100 00000000 0001000010101000 0000100001110000 0000100000011000 0000000000001111 0000000001100000 10101000 0010101010101000 0000101010001010 0100000010101011 0000000000000010 0000000000000000 00000000 0000000000000001 0001010000001000 0000000100101000 0000000101000000 0000000000111000 00000110 0100000010010000 1010000000010100 0010101000000000 0001000000101000 0001010101010000 00000011 0111100000000100 1100000011110000 0001110000010010 0000000000000000 0000000000000010 00000000 0000000000000000 0000000000000000 0000010100001000 0000000000000000 0000001100110011 00000000 0000000000000000 0000000000000001 0000100010000000 0000000000000100 0010100000010000 00000000 0000000000000000 0000000100100000 0000001000010100 0010001000101000 0000001010000100 00000000 0000000000000001 0001010000001000 0000000100101000 0000000101000000 0000000000111000 00000000 0000010111110000 0000000000000000 0000000010101010 0000010101000000 0101100000001100 00000000 0001000010101000 0000100001110000 0000100000011000 0000000000001111 0000000001100000 00000011 0111100000000100 1100000011110000 0001110000010010 0000000000000000 0000000000000010 00000110 0100000010010000 1010000000010100 0010101000000000 0001000000101000 0001010101010000 10101000 0010101010101000 0000101010001010 0100000010101011 0000000000000010 0000000000000000

Step 1: Input Sequence: A[1][0], A[2][0], A[3][0], A[4][0], A[5][0], A[6][0], A[7][0], A[8][0], A[9][0] After HW Sorting: A[4][0], A[5][0], A[2][0], A[6][0], A[9][0], A[7][0], A[3][0], A[1][0], A[8][0] ONLY Record swapped index: 4, 5, 2, 6, 9, 7, 3, 1, 8. Step 2: Input Sequence: A[4][1], A[5][1], A[2][1], A[6][1], A[9][1], A[7][1], A[3][1], A[1][1], A[8][1] After HW Sorting: A[3][1], A[5][1], A[6][1], A[8][1], A[9][1], A[2][1], A[7][1], A[1][1], A[4][1] ONLY Record swapped index: 3, 5, 6, 8, 9, 2, 7, 1, 4. Step 3: Input Sequence: A[3][2], A[5][2], A[6][2], A[8][2], A[9][2], A[2][2], A[7][2], A[1][2], A[4][2] After HW Sorting: A[5][2], A[6][2], A[8][2], A[7][2], A[3][2], A[9][2], A[4][2], A[1][2], A[2][2] ONLY Record swapped index: 5, 6, 8, 7, 3, 9, 4, 1, 2. The final index is the answer. * Swap the original whole number in these sorting steps is unnecessary. Figure 7. Super Radix Sort: 88-digit integer SW LSD radix- 4,294,967,296 (radix- 232 ) sort with 32-bit HW sorter mixed sorting.

HW / SW Running time (Avg. case)

Pure Common CPU Pure Common CPU / Radix Sort / Quick Sort [11] m O(N) m O(N log N)

CPU & One 32-bit HW Sorter* / Super Radix Sort m / 32 Td.

CPU & One 64-bit HW Sorter* / Super Radix Sort m / 64 Td.

CPU & One 256-bit HW Sorter* / Super Radix Sort m / 256 Td.

* If the HW sorter is an N (log 2 N)2 comparator bitonic processor, the order of Td is O((log 2 N)2).

Table 4. The performance order comparison between original architectures and new mixed architectures.
Level-1 stage 1 Level-2 stage 1 stage 2 Level 3 sub-sorter stage 1 stage 2 stage 3

[3] [4]

[5]

[6] [7]

[8] stage 1 stage 2 stage 3 stage 4 6-stage pipeline stage 5 stage 6 [9] [10] [11] [12] [13] [14]

Figure 8. A three-level bitonic sorter. Pipeline this type circuit can get higher hardware sharing and throughputs.
Parameter Range Layout cost Hardware reusing Old design Fixed k-digit 1 0 New design 2
32

k - digit

No change (Slight modification) High

Table 5. New design has high hardware reusing.

IV. CONCLUDING REMARK

This paper discusses on effectively solving arbitrary long digit integer sorting problem by HW/SW co-design under Area x Time2 (AT2) price-performance constraint. The work introduces a two-level (multi-level) sort architecture can attain the object: an accomplished fixeddigit (k-digit) hardware sorter implements first level sorting, software programmed LSD radix (radix 2k) sort implements second level sorting. As Table 5 shows, by HW/SW co-design and reuse methodology, the proposed mixed super radix sorting architecture makes accomplished hardware sorters more flexible and useful: It is time to put a hardware sorter on a common commercial CPU or network processor.

[15] [16]

[17]

[18]

[19] [20] [21] [22] [23]

REFERENCES
[1] [2] M. Afghahi, A 512 16-b Bit-serial Sorter Chip, IEEE J. SolidState Circuits, vol. 26, pp. 14521457, Oct. 1991. B. Ahn and J. M. Murray, A Pipelined, Expandable VLSI Sorting Engine Implemented in CMOS Technology, in Proc. IEEE Intl. Symp. on Circuits and Systems, 1989, pp. 134137.

[24]

S. G. Akl, Parallel Sorting Algorithms. Reading, New York: Academic Press, 1985. K. E. Batcher, Sorting Networks and Their Applications, in Proc. AFIPS 1968 Spring Joint Computer Conference, pp. 307314, Apr. 1968. G. Baudet and D. Stevenson, Optimal Sorting Algorithms for Parallel Computer, IEEE Trans. Computers, vol. 27, pp.8487, Jan. 1978. R. Beigel and J.Gill, Sorting n Objects with a k-sorter, IEEE Trans. Computers, vol. 39, pp.714716, May 1990. G. Bilardi and F. P. Preparata, A Minimum Area VLSI Network for O(log n) Time Sorting, IEEE Trans. Computers, vol. 34, pp.336343, May 1985. T. C. Chen, Vincent Y. Lum, and C. Tung, The Rebound Sorter: An Efficient Sort Engine for Large File, in IEEE Proc. 4th Intl Conf. on Very Large Data Bases, pp. 312318, Sep. 1978. R. Cole and A. R. Seigel, Optimal VLSI Circuits for Sorting, JACM, vol. 35, pp.777-809, 1988. Edward. H. Friend, Sorting on Electronic Computer Systems, JACM, vol. 3, pp.134-168, 1956. C. A. R. Hoare, Quicksort, Computing Journal, vol. 5, pp. 10 15, 1962. M. Keating and P. Bricaud, Reuse Methodology Manual. Reading: Kluwer, 1998. D. E. Knuth, The Art of Computer Programming, Vol 3: Sorting and Searching. Reading: Addison-Wesley, 1973. J.-G. Lee and B.-G. Lee, Realization of Large-scale Distributors Based on Batcher Sorters, IEEE Trans. Communications, vol. 47, pp. 11031110, July 1999. T. Leighton, Tight Bounds on The Complexity of Parallel Sorting, IEEE Trans. Computers, vol. 34, pp. 344354, Apr. 1985. G. S. Miranker, Luong Tang, and Chak-Kuen Wong, A ZeroTime VLSI Sorter, IBM J. Research & Development, vol. 27, pp. 140148, Mar. 1983. S. Olariu, M. C. Pinotti, and S. Q. Zheng, How to Sort N Items Using a Sorting Network of Fixed I/O Size, IEEE Trans. Parallel and Distributed Sys., vol. 10, pp 487499, May 1999. B. Parhami and D.-M. Kwai, Data-driven Control Scheme for Linear Arrays: Application to a Stable Insertion Sorter, IEEE Trans. Parallel and Distributed Sys., vol. 10, pp 2328, Jan. 1999. H. S. Stone, Parallel Processing with the Perfect Shuffle, IEEE Trans. Computers, vol. 20, pp.153161, Feb. 1971. C. D. Thompson, Area-Time Complexity for VLSI, in Proc. 11th Annual ACM Symp. on Theory of Comp., pp. 8188, Apr. 1979. C. D. Thompson, The VLSI Complexity of Sorting, IEEE Trans. Computers, vol. 32, pp.11711184, Dec. 1983. N. H. E. Weste and K. Eshraghian, Principle of CMOS VLSI Design, 2nd Ed., Reading: AddisonWesley, 1993. H. Yasuura, N. Takagi, and S. Yajima, The Parallel Enumeration Sorting Scheme for VLSI, IEEE Trans. Computers, vol. 31, pp.11921201, Dec. 1982. S. Q. Zheng, S. Olariu, and M. C. Pinotti, A Systolic Architecture for Sorting an Arbitrary Number of Elements, in Proc. 1997 3rd Int. Conf. Algorithms and Architectures for Parallel Processing, pp. 113 -126, 1997.

Systems Design and The 8051
No ratings yet
Systems Design and The 8051
447 pages
Hardware Implementatioon of Sorting Algorithm Using FPGA Ijariie7623
No ratings yet
Hardware Implementatioon of Sorting Algorithm Using FPGA Ijariie7623
7 pages
[61B SP25] Lecture 37 - Algorithm Design and Reductions
No ratings yet
[61B SP25] Lecture 37 - Algorithm Design and Reductions
36 pages
FOV AAT_merged
No ratings yet
FOV AAT_merged
15 pages
Radix Sort (Chapter 10)
No ratings yet
Radix Sort (Chapter 10)
11 pages
Combination Logic Design
No ratings yet
Combination Logic Design
45 pages
FPGA Based Hardware Accelerator For Sorting Data
No ratings yet
FPGA Based Hardware Accelerator For Sorting Data
4 pages
An_Efficient_O_N__Comparison-Free_Sorting_Algorithm
No ratings yet
An_Efficient_O_N__Comparison-Free_Sorting_Algorithm
13 pages
A_Low-Cost_Pipelined_Architecture_Based_on_a_Hybrid_Sorting_Algorithm
No ratings yet
A_Low-Cost_Pipelined_Architecture_Based_on_a_Hybrid_Sorting_Algorithm
14 pages
KV Dsflab Mini Projects 2020 PDF
No ratings yet
KV Dsflab Mini Projects 2020 PDF
5 pages
A Low-Cost Pipelined Architecture Based On A Hybrid Sorting Algorithm
No ratings yet
A Low-Cost Pipelined Architecture Based On A Hybrid Sorting Algorithm
14 pages
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
"64-Bit Binary Comparator Using Different Logic Styles": BY Kanika Hans ROLL NO (11472005)
No ratings yet
"64-Bit Binary Comparator Using Different Logic Styles": BY Kanika Hans ROLL NO (11472005)
26 pages
The First Self-Contained Hardware Implementation of Radix Sort
No ratings yet
The First Self-Contained Hardware Implementation of Radix Sort
3 pages
Option 2 The Interrelationship Between Software and Hardware
No ratings yet
Option 2 The Interrelationship Between Software and Hardware
2 pages
Option 2 The Interrelationship Between Software and Hardware
No ratings yet
Option 2 The Interrelationship Between Software and Hardware
2 pages
Efficient Implementation of Sorting On Multi-Core SIMD CPU Architecture
No ratings yet
Efficient Implementation of Sorting On Multi-Core SIMD CPU Architecture
12 pages
Fpga Implementation of Binary Search 1
No ratings yet
Fpga Implementation of Binary Search 1
5 pages
He
No ratings yet
He
10 pages
01301720
No ratings yet
01301720
4 pages
Design and Implementation of Sorting Algorithms Based On FPGA
No ratings yet
Design and Implementation of Sorting Algorithms Based On FPGA
4 pages
Iterative_parallel_shift_sort__Optimization_and_design_for_area_constrained_applications
No ratings yet
Iterative_parallel_shift_sort__Optimization_and_design_for_area_constrained_applications
7 pages
Batch 17 Final Review
No ratings yet
Batch 17 Final Review
31 pages
Final Project: Digital Systems Design Laboratory
No ratings yet
Final Project: Digital Systems Design Laboratory
26 pages
Combinational Logic Design&Analysis
No ratings yet
Combinational Logic Design&Analysis
34 pages
Published Paper
No ratings yet
Published Paper
9 pages
Project Description:: Comm14 Version (I) Microprocessor
No ratings yet
Project Description:: Comm14 Version (I) Microprocessor
7 pages
To Computing System Design
No ratings yet
To Computing System Design
76 pages
CEP CO
No ratings yet
CEP CO
9 pages
ECE 274 - Digital Logic: Counting Fingers Fingers (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
No ratings yet
ECE 274 - Digital Logic: Counting Fingers Fingers (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
12 pages
Combinational Circuits (Yan Gu)
No ratings yet
Combinational Circuits (Yan Gu)
17 pages
Scalable Digital CMOS Comparator Using A Parallel Prefix Tree
No ratings yet
Scalable Digital CMOS Comparator Using A Parallel Prefix Tree
10 pages
Shorting
No ratings yet
Shorting
27 pages
CSC 307 - Computer System Architecture-Elizade 20182019new
No ratings yet
CSC 307 - Computer System Architecture-Elizade 20182019new
162 pages
Radix Sort: Problem Description
No ratings yet
Radix Sort: Problem Description
5 pages
Design of A Soft Core Processor in Fpga IJERTV12IS010057
No ratings yet
Design of A Soft Core Processor in Fpga IJERTV12IS010057
8 pages
Sorting Algorthims With Fpga
No ratings yet
Sorting Algorthims With Fpga
18 pages
Rekha Saripella - Radix and Bucket Sort
No ratings yet
Rekha Saripella - Radix and Bucket Sort
22 pages
Laboratory Exercise 11: Implementing Algorithms in Hardware
No ratings yet
Laboratory Exercise 11: Implementing Algorithms in Hardware
3 pages
Chapter10h PDF
No ratings yet
Chapter10h PDF
72 pages
(Ebook) Circuit Design with VHDL by Volnei A. Pedroni ISBN 9780262042642, 0262042649 instant download
No ratings yet
(Ebook) Circuit Design with VHDL by Volnei A. Pedroni ISBN 9780262042642, 0262042649 instant download
57 pages
Lecture 1 Comp Sys
No ratings yet
Lecture 1 Comp Sys
84 pages
Radix Sorting: IESL College of Engineering
No ratings yet
Radix Sorting: IESL College of Engineering
22 pages
CSE141L 2023fa TermProject
No ratings yet
CSE141L 2023fa TermProject
15 pages
Lab Task UTCN
No ratings yet
Lab Task UTCN
3 pages
Design Abstraction and Validation VLSI MEDC 104
0% (1)
Design Abstraction and Validation VLSI MEDC 104
90 pages
(Ebook) Circuit Design with VHDL by Volnei A. Pedroni ISBN 9780262042642, 0262042649 - The complete ebook version is now available for download
100% (1)
(Ebook) Circuit Design with VHDL by Volnei A. Pedroni ISBN 9780262042642, 0262042649 - The complete ebook version is now available for download
47 pages
AVLSI 11-07-22 Introduction
No ratings yet
AVLSI 11-07-22 Introduction
26 pages
155E1120 COMP ARCHITECTURE MICROPROCESSOR
No ratings yet
155E1120 COMP ARCHITECTURE MICROPROCESSOR
251 pages
LECTURE 1-Digital circuit
No ratings yet
LECTURE 1-Digital circuit
41 pages
S Rawat
No ratings yet
S Rawat
49 pages
Fpga 1721804616
No ratings yet
Fpga 1721804616
39 pages
CH 01
No ratings yet
CH 01
28 pages
Area Efficient Comp
No ratings yet
Area Efficient Comp
6 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
Digital Raster Graphic: Unveiling the Power of Digital Raster Graphics in Computer Vision
From Everand
Digital Raster Graphic: Unveiling the Power of Digital Raster Graphics in Computer Vision
Fouad Sabry
No ratings yet
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
From Everand
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
Analog Dialogue
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
Case Study - Nokia
No ratings yet
Case Study - Nokia
1 page
Introduction To Programming in C++ Summer II 2002: Midterm Test
No ratings yet
Introduction To Programming in C++ Summer II 2002: Midterm Test
13 pages
Grid - Selenium
No ratings yet
Grid - Selenium
79 pages
Bts3900&Bts5900 v100r013c10spc260 Gbtsfunction Performance
No ratings yet
Bts3900&Bts5900 v100r013c10spc260 Gbtsfunction Performance
21 pages
Result PDF
No ratings yet
Result PDF
1 page
Class Notes
No ratings yet
Class Notes
94 pages
IJRTI2304061
No ratings yet
IJRTI2304061
5 pages
DBMS 22MCA21
No ratings yet
DBMS 22MCA21
3 pages
School Computer Education Made Easy
No ratings yet
School Computer Education Made Easy
4 pages
A 5G Framework and Its Analysis of Interference Cancellation in Multi-Tier Heterogeneous Networks
No ratings yet
A 5G Framework and Its Analysis of Interference Cancellation in Multi-Tier Heterogeneous Networks
8 pages
Placing Figures Side by Side
No ratings yet
Placing Figures Side by Side
3 pages
Proprietary Software - Wikipedia
No ratings yet
Proprietary Software - Wikipedia
79 pages
NPAV Installation Guide
100% (1)
NPAV Installation Guide
16 pages
SQL Syntax
No ratings yet
SQL Syntax
11 pages
Matlab and Modelsim Linking
No ratings yet
Matlab and Modelsim Linking
13 pages
Arc Map
No ratings yet
Arc Map
2 pages
12 Transaction Processing PDF
No ratings yet
12 Transaction Processing PDF
50 pages
Checks When SAP R3 Is Very Slow
No ratings yet
Checks When SAP R3 Is Very Slow
34 pages
World Wide Web: Invented By: Tim Berners-Lee
No ratings yet
World Wide Web: Invented By: Tim Berners-Lee
10 pages
Sa-Unit 1 Notes
No ratings yet
Sa-Unit 1 Notes
15 pages
JQuesry + JavaScripting Language by JUNAID
No ratings yet
JQuesry + JavaScripting Language by JUNAID
46 pages
P1E492591
No ratings yet
P1E492591
6 pages
Java MCQ
100% (1)
Java MCQ
13 pages
Success Story: With Over 4 Billion Mobile Subscriptions Globally, Tenxc Maximizes RF Network Performance
No ratings yet
Success Story: With Over 4 Billion Mobile Subscriptions Globally, Tenxc Maximizes RF Network Performance
2 pages
Finding Pattern Using Apriori Algorithm Through WEKA Tool: Chaman Verma
100% (1)
Finding Pattern Using Apriori Algorithm Through WEKA Tool: Chaman Verma
6 pages
GSM Remote Gate Opener
No ratings yet
GSM Remote Gate Opener
10 pages
Cadworx Customising Backing Sheet
100% (1)
Cadworx Customising Backing Sheet
27 pages
IGI Global Book
No ratings yet
IGI Global Book
438 pages
Buy Microsoft Proprietary Item Operating System Software Online - Government e Marketplace (GeM)
No ratings yet
Buy Microsoft Proprietary Item Operating System Software Online - Government e Marketplace (GeM)
7 pages
Enpac 2500
No ratings yet
Enpac 2500
2 pages

A L D I S HW/SW C - D: Shun-Wen Cheng

Uploaded by

A L D I S HW/SW C - D: Shun-Wen Cheng

Uploaded by

ARBITRARY LONG DIGIT INTEGER SORTER HW/SW CO-DESIGN

Figure 1. Compare & swap elements are vital for sorting

Index Term HW/SW co-design, Reusable & Embedded

(a) 1-bit magnitude comparator.

(b) 2-bit comparator.

(c) 4-bit comparator.

(d) 16-bit comparator.

Figure 2. The Circuit of magnitude comparators.

Number of clock cycles

16-bit Bit-Serial Sorter [1] (1991)

Table 2. Chip Comparison of m 16-bit hardware sorter designs.

III. A MULTI-LEVEL MIXED ARCHITECTURE: SUPER RADIX SORT

32 bits int int int int int int int 1 2 3 4 5 6 7

32 bits Main Memory / Virtual Memory Space

(a) Sort step 1.

Sort Speed (Performance)

Big CPU + Light HW Sorter

(b) Sort step 2.

Small CPU + Powerful HW Sorter Small Total Area (Cost) Large

Figure 6. Impact and challenge of hardware/software co-design trade-off.

HW / SW Running time (Avg. case)

CPU & One 32-bit HW Sorter* / Super Radix Sort m / 32 Td.

CPU & One 64-bit HW Sorter* / Super Radix Sort m / 64 Td.

No change (Slight modification) High

Table 5. New design has high hardware reusing.

IV. CONCLUDING REMARK

[19] [20] [21] [22] [23]

You might also like