An Index-Based Search Method for Performance Improvement of Set-Based Similar Sequence Matching

KIPS Transactions on Software and Data Engineering, Vol. 6, No.11, pp.507-520, November 2017
10.3745/KTSDE.2017.6.11.507, Full Text

Abstract

The set-based similar sequence matching method measures similarity not for an individual data item but for a set grouping multiple data items. In the method, the similarity of two sets is represented as the size of intersection between them. However, there is a critical performances issue for the method in twofold: 1) calculating intersection size is a time consuming process, and 2) the number of set pairs that should be calculated the intersection size is quite large. In this paper, we propose an index-based search method for improving performance of set-based similar sequence matching in order to solve these performance issues. Our method consists of two parts. In the first part, we convert the set similarity problem into the intersection size comparison problem, and then, provide an index structure that accelerates the intersection size calculation. Second, we propose an efficient set-based similar sequence matching method which exploits the proposed index structure. Through experiments, we show that the proposed method reduces the execution time by 30 to 50 times then the existing methods. We also show that the proposed method has scalability since the performance gap becomes larger as the number of data sequences increases.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from October 15, 2016)

Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


Cite this paper

[KIPS Transactions Style]
J. Lee and H. Lim, "An Index-Based Search Method for Performance Improvement of Set-Based Similar Sequence Matching," KIPS Transactions on Software and Data Engineering, Vol.6, No.11, pp.507-520, 2017, DOI: 10.3745/KTSDE.2017.6.11.507.

[IEEE Style]
Juwon Lee and Hyo-Sang Lim, "An Index-Based Search Method for Performance Improvement of Set-Based Similar Sequence Matching," KIPS Transactions on Software and Data Engineering, vol. 6, no. 11, pp. 507-520, 2017. DOI: 10.3745/KTSDE.2017.6.11.507.

[ACM Style]
Lee, J. and Lim, H. 2017. An Index-Based Search Method for Performance Improvement of Set-Based Similar Sequence Matching. KIPS Transactions on Software and Data Engineering, 6, 11, (2017), 507-520. DOI: 10.3745/KTSDE.2017.6.11.507.