바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

A Keyword Matching for the Retrieval of Low-Quality Hangul Document Images

A Keyword Matching for the Retrieval of Low-Quality Hangul Document Images

한국문헌정보학회지 / 한국문헌정보학회지, (P)1225-598X; (E)2982-6292
2013, v.47 no.1, pp.39-55
https://doi.org/10.4275/KSLIS.2013.47.1.039
나인섭 (전남대학교)
박상철 (삼성메디슨)
김수형 (전남대학교)
  • 다운로드 수
  • 조회수

Abstract

It is a difficult problem to use keyword retrieval for low-quality Korean document images because these include adjacent characters that are connected. In addition, images that are created from various fonts are likely to be distorted during acquisition. In this paper, we propose and test a keyword retrieval system, using a support vector machine (SVM) for the retrieval of low-quality Korean document images. We propose a keyword retrieval method using an SVM to discriminate the similarity between two word images. We demonstrated that the proposed keyword retrieval method is more effective than the accumulated Optical Character Recognition (OCR)-based searching method. Moreover, using the SVM is better than Bayesian decision or artificial neural network for determining the similarity of two images.

keywords
Low-Quality Korean Document Keyword Retrieval, SVM, OCR, Digital Library

참고문헌

1.

Chen, F. R., Wilcox, L.D., & Bloomberg, D.S. 1995. “A comparison of discrete and continuous hidden Markov models for phrase spotting in text images.” Proc. International Conference on Document Analysis and Recognition, 1: 398-402.

2.

DeCurtins, J., & Chen, E. 1995. “Keyword spotting via word shape recognition.” Proc. SPIE Document Recognition II, 270-277.

3.

Doermann, D. 1998. “The indexing and retrieval of document images.” a survey. Computer Vision and Image Understanding, 70(3): 287-298.

4.

Fausett, L. 1994. Fundamentals of Neural Networks. Prentice Hall.

5.

Gose, E., Johnsonbaugh, R., & Jost, S. 1996. Pattern Recognition and Image Analysis. Prentice Hall.

6.

Jeong, C.B., Park, S.C., Son, H.J., & Kim, S.H. 2005. “Word Extraction from Table Regions in Document Images for Keyword Spotting.” Lecture Notes in Computer Science, 214-223.

7.

Jung, M. C., Shin, Y. C., & Srihari, S. N. 1999. “Machine printed character segmentation method using side profiles,” in Proc. IEEE Int. Conf. Systems, Man, Cybernetics (SMC), 6: 863-867.

8.

Kwag, H. K. 2001. A Study on Word Segmentation and Attribute Extraction from Document Images. Ph.D. dissertation, Chonnam National University, Korea.

9.

Kim, Dae Su, 1992. Neural Network Theory and Applications 1. Ha-Tech jeongbo Press.

10.

Kim, H. G., Yang, J. H., Lee, J. S., & Oh, I. S. 2001. “Image-based retrieval of printed Korean words using wavelets.” Journal of Korea Information Science Society, 28(2): 91-103.

11.

Kim, Soo H., Park, S.C., Jeong, C.B., Kim, J.S., Park, H.R., & Lee, G.S. 2005. “Keyword Spotting on Korean Document Images by Matching the Keyword Image.” Lecture Notes in Computer Science, 158-166.

12.

Lazebnik, S., Schmid, C., & Ponce, J. 2006. “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.” Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, 2169-2178.

13.

Liang, Y., Fairhurst, M.C., & Guest, R.M. 2012. “A synthesised word approach to word retrieval in handwritten documents.” Pattern Recognition, 45(12): 4225-4236.

14.

Lu, Y., & Tan, C. L. 2002. “Word searching in document images using word portion matching.” Fifth IAPR International Workshop on Document Analysis Systems, USA, 319-328.

15.

Lu, Y., & Tan, C. L. 2004. “Information Retrieval in Document Image Databases.” IEEE Transactions on Knowledge and Data Engineering, 16(11): 1398-1410.

16.

Marukawa, K., Hu, T., Fujisawa, H., & Shima, Y. 1997. “Document retrieval tolerating character recognition errors-evaluation and application.” Pattern Recogn, 30: 1361-1371.

17.

Mitra, M., & Chaudhuri, B.B. 2000. “Information Retrieval from Documents,” A Survey, Information Retrieval, 2: 141-163.

18.

Ohta, M., Takasu, A., & Adach, J. 1997. “Retrieval methods for English-text width misrecognized OCR characters.” Proceedings of 4th International Conference on Document Analysis and Recognition, 2: 950-955.

19.

Park, S.C., Son, H.J., Jeong, C.B., & Kim, Soo H. 2005. “Keyword Spotting on HangulDocument Images Using Two-level Image-to-Image Matching.” Lecture Notes in Artificial Intelligence, 79-81.

20.

Park, S.C., Son, H.J., Jeong, C.B., & Kim, Soo H. 2006. Character Segmentation and Keyword Matching for the Retrieval of Low Quality Korean Document Images. Ph.D. diss., Chonnam National University. Gwangju. Korea.

21.

Rodriguez-Serrano, Jose A. Perronnin, Florent. 2012. “Synthesizing queries for handwritten word image retrieval.” Pattern Recognition, 45(9): 3270-3276.

22.

Salton, G., Allan, J., Buckley, C., & Singhal, A. 1994. “Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Text.” Science, 264: 1421-1426.

23.

Strathy, N. W., Suen, C. Y., & Krzyzak, A. 1993. “Segmentation of handwritten digits using contour features.” Document Analysis and Recognition, Proceedings of the Second International Conference, 577-580.

24.

Steinwart, Ingo, & Christmann, Andreas. 2008. Support Vector Machines. New York: Springer- Verlag.

25.

Tan, C. L., Huang, W., Yu, Z., & Xu, Y. 2002. “Image document text retrieval without OCR.” IEEE Transaction on Pattern Analysis and Machine Intelligence, 24(7): 838-844.

26.

Yates, R. B., & Neto, B. R. 1999. Modern Information Retrieval. 75-82. ACM press.

한국문헌정보학회지