Experimental results and comparisons

of 2-D Dual-Mode Lifting-Based Discrete

6. Experimental results and comparisons

The 2-D dual-mode LDWT considers a trade-off between low transpose memory and low complexity in the design of VLSI architecture. Tables 2 and 3 show the performance comparisons of the proposed architecture and other similar architectures. Compression results indicate that the proposed VLSI architecture outperforms previous works in terms of transpose memory size, requiring about 50% less memory than the JPEG2000 standard (Chen, 2004) architecture. Moreover, the 2-D LDWT is frame-based, and its implementation bottleneck is the huge transpose memory. Less memory units are needed in our architecture and the latency is fixed on (3/2)N+3 clock cycles. Our architecture can also provide an embedded symmetrical extension function. The proposed IRSA approach has the advantages of memory-efficient and high-speed. The proposed 2-D dual-mode LDWT adopts parallel and pipelined schemes to reduce the transpose memory and increase the operation speed. The shifters and adders replace multipliers in the computation to reduce the hardware cost. Chen et al. (Chen & Wu, 2002) proposed a folded and pipelined architecture to compute the 2-D 5/3 lifting-based DWT, and they used transpose memory size of 2.5N for an NN 2-D DWT. This lifting architecture for vertical filtering with two adders and one multiplier is divided into two parts, and each part has one adder and one multiplier. Because both parts are activated in different cycles, they can share the same adder and multiplier. It can increase the hardware utilization and reduce the latency.

However, according to the characteristics of the signal flow, it will increase the complexity at the same time.

A 256×256 2-D LDWT was designed and simulated with VerilogHDL and further synthesized by the Synopsys design compiler with TSMC 0.18μm 1P6M CMOS standard process technology. The detailed specs of the 256×256 2-D LDWT are listed in Table 4.

Fig. 20. The processing procedures of 2-D dual-mode LDWTs under the same IRSA architecture.

The multi-level DWT computation can be implemented in a similar manner by the high performance 1-level 2-D LDWT. For the multi-level computation, this architecture needs N2/4 off-chip memory. As illustrated in Fig. 21, the off-chip memory is used to temporarily store the LL subband coefficients for the next iteration computations. The second level

computation requires N/2 counters and N/2 FIFO’s for the control unit. The third level computation requires N/4 counters and N/4 FIFO’s for the control unit. Generally in the jth level computation, we need N/2j-1 counters and N/2j-1 FIFO’s.

Fig. 21. The multilevel 2-D DWT architecture.

6. Experimental results and comparisons

However, according to the characteristics of the signal flow, it will increase the complexity at the same time.

5/3 LDWT

architecture Ours Diou et al., 2001

Andra et al., 2002

Chen &

Wu, 2002

Chen,

2002 Chiang &

Hsia, 2005

Mei et al., 2006

Huang et al, 2005

Wu &

Lin, 2005 Transpose

memory1 (bytes)

2N 3.5N 3.5N 2.5N 3N N2/4+5N 2N 3.5N 3.5N

Computation

time2 (3/4

)N2+ (3/2)N +7

--- (N2/2)+

N+5 N2 (N2/2)+N

+5 N2 (N2/

2)+N --- 10+(4/3)

N2[1- (1/4)]+2N [1-(1/2)]

Adders 8 12 8 6 5 4 8 --- ---

Multipliers 0 6 4 4 0 0 0 --- 6

1 Transpose memory size is used to store frequency coefficients in the 1-L 2-D DWT.

2 In a system, computing time represents the time used to compute an image of size N×N.

3 Suppose the image is of size N×N.

Table 2. Comparisons of 2-D architectures for 5/3 LDWT.

9/7 LDWT

architecture Ours Andra et al., 2002

Jung &

Park, 2005

Chen,

20041 Vishwanath

et al., 1995 Huang et al., 2005 Huang

et al, 2005

Wu & Lin,

2005 Lan et

al., 2005 Wu. &

Chen, 2001 Transpose

memory (bytes)

4N N2 12N N2/4+L

N+L 22N 14N 5.5N 5.5N --- N2+4N+

4 Computatio

n time (3/4)N2 +(3/2) N +7

4N2/3+

2 N2 N2/2~(2

/3)N N2 --- --- 22+(4/3)N2[1

-(1/4)]+6N[1- (1/2)]

--- 2N2/3

Adders 16 8 12 4 L 36 16 16 8 32 16

Multipliers 0 4 9 4 L 36 12 10 6 20 16

1 L: the filter length.

Table 3. Comparisons of the 2-D architectures for 9/7 LDWT.

Chip specification N = 256, Tile size = 256256

Gate count 29,196 gates

Power supply 1.8V

Technology TSMC 0.18m 1P6M (CMOS)

On-Chip memory size (Transpose

+ Internal) 2-D 5/3 DWT: 512 bytes

2-D 9/7 DWT: 1,024 bytes

Latency (3/2)N+3 = 387 clock cycles

Computing time (3/4)N2+(3/2)N+7 = 49,543 clock cycles

Maximum clock rate 83 MHz

Table 4. Design specification of the proposed 2-D DWT.

7. Conclusions

This work presents a new architecture to reduce the transpose memory requirement in 2-D LDWT. The proposed architecture has a mixed row- and column-wise signal flow, rather than purely row-wise as in traditional 2-D LDWT. Further we propose a new approach, interlaced read scan algorithm (IRSA), to reduce the transpose memory for a 2-D dual-mode LDWT. The proposed 2-D architectures are more efficient than previous architectures in trading off low transpose memory, output latency, control complexity, and regular memory access sequence. The proposed architecture reduces the transpose memory significantly to a memory size of only 2N or 4N (5/3 or 9/7 mode) and reduces the latency to (3/2)N+3 clock cycles. Due to the regularity and simplicity of the IRSA LDWT architecture, a dual mode (5/3 and 9/7) 256256 2-D LDWT prototyping chip was designed by TSMC 0.18m 1P6M standard CMOS technology. The 5/3 and 9/7 filters with different lifting steps are realized by cascading the four modules (split, predict, update, and scaling phases). The prototyping chip takes 29,196 gate counts and can operate at 83 MHz. The method is applicable to any DWT-based signal compression standard, such as JPEG2000, Motion-JPEG2000, MPEG-4 still texture object decoding, and wavelet-based scalable video coding (SVC).

8. References

Andra, K.; Chakrabarti, C. & Acharya, T. (2000). A VLSI architecture for lifting-based wavelet transform, IEEE Workshop on Signal Processing Systems, (October 2000) pp.

70-79.

Andra, K.; Chakrabarti, C. & Acharya, T. (2002). A VLSI architecture for lifting-based forward and inverse wavelet transform, IEEE Transactions on Signal Processing, Vol.

50, No.4, (April 2002) pp. 966-977.

Chen, P.-Y. (2002). VLSI implementation of discrete wavelet transform using the 5/3 filter, IEICE Transactions on Information and Systems, Vol. E85-D, No.12, (December 2002) pp. 1893-1897.

Chen, P.-Y. (2004). VLSI implementation for one-dimensional multilevel lifting-based wavelet transform, IEEE Transactions on Computer, Vol. 53, No. 4, (April 2004) pp.

386-398.

Chen, P. & Woods, J. W. (2004). Bidirectional MC-EZBC with lifting implementation, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 14, No. 10, (October 2004) pp. 1183-1194.

Chen, S.-C. & Wu, C.-C. (2002). An architecture of 2-D 3-level lifting-based discrete wavelet transform, VLSI Design/ CAD Symposium, (August 2002) pp. 351-354.

Chiang, J.-S. & Hsia, C.-H. (2005). An efficient VLSI architecture for 2-D DWT using lifting scheme, IEEE International Conference on Systems and Signals, (April 2005) pp. 528- 531.

Christopoulos, C.; Skodras, A. N. & Ebrahimi, T. (2000). The JPEG2000 still image coding system: An overview, IEEE Trans. on Consumer Electronics, Vol. 46, No. 4, (November 2000) pp. 1103-1127.

Daubechies, I. & Sweldens, W. (1998). Factoring wavelet transforms into lifting steps, The Journal of Fourier Analysis and Applications, Vol. 4, No.3, (1998) pp. 247-269.

5/3 LDWT

architecture Ours Diou et al., 2001

Andra et al., 2002

Chen &

Wu, 2002

Chen,

2002 Chiang &

Hsia, 2005

Mei et al., 2006

Huang et al, 2005

Wu &

Lin, 2005 Transpose

memory1 (bytes)

2N 3.5N 3.5N 2.5N 3N N2/4+5N 2N 3.5N 3.5N

Computation

time2 (3/4

)N2+ (3/2)N +7

--- (N2/2)+

N+5 N2 (N2/2)+N

+5 N2 (N2/

2)+N --- 10+(4/3)

N2[1- (1/4)]+2N [1-(1/2)]

Adders 8 12 8 6 5 4 8 --- ---

Multipliers 0 6 4 4 0 0 0 --- 6

1 Transpose memory size is used to store frequency coefficients in the 1-L 2-D DWT.

2 In a system, computing time represents the time used to compute an image of size N×N.

3 Suppose the image is of size N×N.

Table 2. Comparisons of 2-D architectures for 5/3 LDWT.

9/7 LDWT

architecture Ours Andra et al., 2002

Jung &

Park, 2005

Chen,

20041 Vishwanath

et al., 1995 Huang et al., 2005 Huang

et al, 2005

Wu & Lin,

2005 Lan et

al., 2005 Wu. &

Chen, 2001 Transpose

memory (bytes)

4N N2 12N N2/4+L

N+L 22N 14N 5.5N 5.5N --- N2+4N+

4 Computatio

n time (3/4)N2 +(3/2) N +7

4N2/3+

2 N2 N2/2~(2

/3)N N2 --- --- 22+(4/3)N2[1

-(1/4)]+6N[1- (1/2)]

--- 2N2/3

Adders 16 8 12 4 L 36 16 16 8 32 16

Multipliers 0 4 9 4 L 36 12 10 6 20 16

1 L: the filter length.

Table 3. Comparisons of the 2-D architectures for 9/7 LDWT.

Chip specification N = 256, Tile size = 256256

Gate count 29,196 gates

Power supply 1.8V

Technology TSMC 0.18m 1P6M (CMOS)

On-Chip memory size (Transpose

+ Internal) 2-D 5/3 DWT: 512 bytes

2-D 9/7 DWT: 1,024 bytes

Latency (3/2)N+3 = 387 clock cycles

Computing time (3/4)N2+(3/2)N+7 = 49,543 clock cycles

Maximum clock rate 83 MHz

Table 4. Design specification of the proposed 2-D DWT.

7. Conclusions

8. References

Andra, K.; Chakrabarti, C. & Acharya, T. (2000). A VLSI architecture for lifting-based wavelet transform, IEEE Workshop on Signal Processing Systems, (October 2000) pp.

70-79.

Andra, K.; Chakrabarti, C. & Acharya, T. (2002). A VLSI architecture for lifting-based forward and inverse wavelet transform, IEEE Transactions on Signal Processing, Vol.

50, No.4, (April 2002) pp. 966-977.

Chen, P.-Y. (2002). VLSI implementation of discrete wavelet transform using the 5/3 filter, IEICE Transactions on Information and Systems, Vol. E85-D, No.12, (December 2002) pp. 1893-1897.

Chen, P.-Y. (2004). VLSI implementation for one-dimensional multilevel lifting-based wavelet transform, IEEE Transactions on Computer, Vol. 53, No. 4, (April 2004) pp.

386-398.

Chen, P. & Woods, J. W. (2004). Bidirectional MC-EZBC with lifting implementation, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 14, No. 10, (October 2004) pp. 1183-1194.

Chen, S.-C. & Wu, C.-C. (2002). An architecture of 2-D 3-level lifting-based discrete wavelet transform, VLSI Design/ CAD Symposium, (August 2002) pp. 351-354.

Chiang, J.-S. & Hsia, C.-H. (2005). An efficient VLSI architecture for 2-D DWT using lifting scheme, IEEE International Conference on Systems and Signals, (April 2005) pp. 528- 531.

Christopoulos, C.; Skodras, A. N. & Ebrahimi, T. (2000). The JPEG2000 still image coding system: An overview, IEEE Trans. on Consumer Electronics, Vol. 46, No. 4, (November 2000) pp. 1103-1127.

Daubechies, I. & Sweldens, W. (1998). Factoring wavelet transforms into lifting steps, The Journal of Fourier Analysis and Applications, Vol. 4, No.3, (1998) pp. 247-269.

Diou, C.; Torres, L. & Robert, M. (2001). An embedded core for the 2-D wavelet transform, IEEE on Emerging Technologies and Factory Automation Proceedings, Vol. 2, (October 2001) pp. 179-186.

Habibi, A. & Hershel, R. S. (1974). A unified representation of differential pulse code modulation (DPCM) and transform coding systems, IEEE Transactions on Communications, Vol. 22, No. 5, (May 1974) pp. 692-696.

Hsia, C.-H. & Chiang, J.-S. (2008). New memory-efficient hardware architecture of 2-D dual- mode lifting-based discrete wavelet transform for JPEG2000, IEEE International Conference on Communication Systems, (November 2008) pp. 766-772.

Huang, C.-T.; Tseng, P.-C. & Chen, L.-G. (2002). Efficient VLSI architecture of lifting-based discrete wavelet transform by systematic design method, IEEE International Symposium Circuits and Systems, Vol. 5, (May 2002) pp. 26-29.

Huang, C.-T.; Tseng, P.-C. & Chen, L.-G. (2004). Flipping structure: An efficient VLSI architecture for lifting-based discrete wavelet transform, IEEE Transactions on Signal Processing, Vol. 52, No. 4, (April 2004) pp. 1080-1089.

Huang, C.-T.; Tseng, P.-C. & Chen, L.-G. (2005). VLSI architecture for lifting-based shape- adaptive discrete wavelet transform with odd-symmetric filters, Journal of VLSI Signal Processing Systems, Vol. 40, No. 2, (June 2005) pp.175-188.

Huang, C.-T.; Tseng, P.-C. & Chen, L.-G. (2005). Analysis and VLSI architecture for 1-D and 2-D discrete wavelet transform, IEEE Transactions on Signal Processing, Vol. 53, No.

4, (April 2005) pp. 1575-1586.

Huang, C.-T.; Tseng, P.-C. & Chen, L.-G. (2005). Generic RAM-based architecture for two- dimensional discrete wavelet transform with line-based method, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 15, No. 7, (July 2005) pp. 910-919.

ISO/IEC 15444-1 JTC1/SC29 WG1. (2000). JPEG 2000 Part 1 Final Committee Draft Version 1.0, Information Technology.

ISO/IEC JTC1/SC29/WG1 Wgln 1684 (2000). JPEG 2000 Verification Model 9.0.

ISO/IEC 15444-1 JTC1/SC29 WG1. (2000). Motion JPEG2000, ISO/IEC ISO/IEC 15444-3, Information Technology.

ISO/IEC JTC1/SC29 WG11. (2001), Coding of Moving Pictures and Audio, Information Technology.

Jiang, W. & Ortega, A. (2001). Lifting factorization-based discrete wavelet transform based architecture design, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No. 5, (May 2001) pp. 651-657.

Jung, G.-C. & Park, S.-M. (2005). VLSI implement of lifting wavelet transform of JPEG2000 with efficient RPA (recursive pyramid algorithm) realization, IEICE Transactions on Fundamentals, Vol. E88-A, No. 12, (December 2005) pp. 3508-3515.

Kondo, H. & Oishi, Y. (2000). Digital image compression using directional sub-block DCT, International Conference on Communications Technology, Vol. 1, (August 2000) p p. 985 -992.

Lan, X.; Zheng, N. & Liu, Y. (2005). Low-power and high-speed VLSI architecture for lifting- based forward and inverse wavelet transform, IEEE Transactions on Consumer Electronics, Vol. 51, No. 2, (May 2005) pp. 379-385.

Li, W.-M.; Hsia, C.-H. & Chiang, J.-S. (2009). Memory-efficient architecture of 2-D dual- mode lifting scheme discrete wavelet transform for Moion-JPEG2000, IEEE International Symposium on Circuits and Systems, (May 2009) pp. 750-753.

Lian, C.-J.; Chen, K.-F.; Chen, H.-H. & Chen, L.-G. (2001). Lifting based discrete wavelet transform architecture for JPEG2000, IEEE International Symposium on Circuits and Systems, Vol. 2, (May 2001) pp. 445-448.

Mallat, S. G. (1989). A theory for multi-resolution signal decomposition: The wavelet representation, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7, (July 1989) pp. 674-693.

Mallat, S. G. (1989). Multi-frequency channel decompositions of images and wavelet models, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP-37, No. 12, (December 1989) pp. 2091-2110.

Marcellin, M. W.; Gormish, M. J. & Skodras, A. N. (2000). JPEG2000: The new still picture compression standard, ACM Multimedia Workshops, (September 2000) pp. 45-49.

Marino, F. (2000). Efficient high-speed/low-power pipelined architecture for the direct 2-D discrete wavelet transform, IEEE Transactions on Circuits and Systems II, Vol. 47, No.

12, (December 2000) pp. 1476-1491.

Martina, M. & Masera, G. (2007). Folded multiplierless lifting-based wavelet pipeline, IET Electronics Letters, Vol. 43, No. 5, (March 2007) pp. 27-28.

Mei, K.; Zheng, N. & van de Wetering, H. (2006). High-speed and memory-efficient VLSI design of 2-D DWT for JPEG2000, IET Electronics Letter, Vol. 42, No. 16, (August 2006) pp. 907-908.

Ohm, J.-R. (2005). Advances in scalable video coding, Proceedings of The IEEE, Invited Paper, Vol. 93, No.1, pp. 42-56, (January 2005) pp. 42-56.

Richardson, I. (2003). H.264 and MPEG-4 Video Compression, John Wiley & Sons Ltd.

Seo, Y.-H. & Kim, D.-W. (2007). VLSI architecture of line-based lifting wavelet transform for Motion JPEG2000, IEEE Journal of Solid-State Circuits, Vol. 42, No. 2, (February 2007) pp. 431-440.

Sweldens, W. (1996). The lifting scheme: A custom-design construction of biorthogonal wavelets, Applied and Computation Harmonic Analysis, Vol. 3, No. 15, (1996) pp.186- 200.

Tan, K.C.B. & Arslan, T. (2001). Low power embedded extension algorithm for the lifting based discrete wavelet transform in JPEG2000, IET Electronics Letters, Vol. 37, No.

22, (October 2001) pp.1328-1330.

Tan, K.C.B. & Arslan, T. (2003). Shift-accumulator ALU centric JPEG 2000 5/3 lifting based discrete wavelet transform architecture, IEEE International Symposium on Circuits and Systems, Vol. 5, (May 2003) pp. V161-V164.

Taubman, D. & Marcellin, M. W. (2001). JPEG2000 image compression fundamentals, standards, and practice, Kluwer Academic Publisher.

Varshney, H.; Hasan, M. & Jain, S. (2007). Energy efficient novel architecture for the lifting- based discrete wavelet transform, IET Image Process, Vol. 1, No. 3, (September 2007) pp.305-310.

Vishwanath, M.; Owens, R. M. & Irwin, M. J. (1995). VLSI architecture for the discrete wavelet transform, IEEE Transactions on Circuits and Systems II, Vol. 42, No. 5, (May 1995) pp. 305-316.

Weeks, M. & Bayoumi, M. A. (2002). Three-dimensional discrete wavelet transform architectures, IEEE Transactions on Signal Processing, Vol. 50, Vo.8, (August 2002) pp.

2050-2063.

Diou, C.; Torres, L. & Robert, M. (2001). An embedded core for the 2-D wavelet transform, IEEE on Emerging Technologies and Factory Automation Proceedings, Vol. 2, (October 2001) pp. 179-186.

Huang, C.-T.; Tseng, P.-C. & Chen, L.-G. (2005). Analysis and VLSI architecture for 1-D and 2-D discrete wavelet transform, IEEE Transactions on Signal Processing, Vol. 53, No.

4, (April 2005) pp. 1575-1586.

ISO/IEC 15444-1 JTC1/SC29 WG1. (2000). JPEG 2000 Part 1 Final Committee Draft Version 1.0, Information Technology.

ISO/IEC JTC1/SC29/WG1 Wgln 1684 (2000). JPEG 2000 Verification Model 9.0.

ISO/IEC 15444-1 JTC1/SC29 WG1. (2000). Motion JPEG2000, ISO/IEC ISO/IEC 15444-3, Information Technology.

ISO/IEC JTC1/SC29 WG11. (2001), Coding of Moving Pictures and Audio, Information Technology.

Kondo, H. & Oishi, Y. (2000). Digital image compression using directional sub-block DCT, International Conference on Communications Technology, Vol. 1, (August 2000) p p. 985 -992.

Marcellin, M. W.; Gormish, M. J. & Skodras, A. N. (2000). JPEG2000: The new still picture compression standard, ACM Multimedia Workshops, (September 2000) pp. 45-49.

Marino, F. (2000). Efficient high-speed/low-power pipelined architecture for the direct 2-D discrete wavelet transform, IEEE Transactions on Circuits and Systems II, Vol. 47, No.

12, (December 2000) pp. 1476-1491.

Martina, M. & Masera, G. (2007). Folded multiplierless lifting-based wavelet pipeline, IET Electronics Letters, Vol. 43, No. 5, (March 2007) pp. 27-28.

Mei, K.; Zheng, N. & van de Wetering, H. (2006). High-speed and memory-efficient VLSI design of 2-D DWT for JPEG2000, IET Electronics Letter, Vol. 42, No. 16, (August 2006) pp. 907-908.

Ohm, J.-R. (2005). Advances in scalable video coding, Proceedings of The IEEE, Invited Paper, Vol. 93, No.1, pp. 42-56, (January 2005) pp. 42-56.

Richardson, I. (2003). H.264 and MPEG-4 Video Compression, John Wiley & Sons Ltd.

Seo, Y.-H. & Kim, D.-W. (2007). VLSI architecture of line-based lifting wavelet transform for Motion JPEG2000, IEEE Journal of Solid-State Circuits, Vol. 42, No. 2, (February 2007) pp. 431-440.

Sweldens, W. (1996). The lifting scheme: A custom-design construction of biorthogonal wavelets, Applied and Computation Harmonic Analysis, Vol. 3, No. 15, (1996) pp.186- 200.

Tan, K.C.B. & Arslan, T. (2001). Low power embedded extension algorithm for the lifting based discrete wavelet transform in JPEG2000, IET Electronics Letters, Vol. 37, No.

22, (October 2001) pp.1328-1330.

Taubman, D. & Marcellin, M. W. (2001). JPEG2000 image compression fundamentals, standards, and practice, Kluwer Academic Publisher.

Varshney, H.; Hasan, M. & Jain, S. (2007). Energy efficient novel architecture for the lifting- based discrete wavelet transform, IET Image Process, Vol. 1, No. 3, (September 2007) pp.305-310.

Vishwanath, M.; Owens, R. M. & Irwin, M. J. (1995). VLSI architecture for the discrete wavelet transform, IEEE Transactions on Circuits and Systems II, Vol. 42, No. 5, (May 1995) pp. 305-316.

Weeks, M. & Bayoumi, M. A. (2002). Three-dimensional discrete wavelet transform architectures, IEEE Transactions on Signal Processing, Vol. 50, Vo.8, (August 2002) pp.

2050-2063.

Modified RPs for 4-parallell architecture

Proposed Contour-Based Binary Motion Estimation (CBBME) Method