Purchase Transaction Similarity Measure Considering Product Taxonomy

KIPS Transactions on Software and Data Engineering, Vol. 8, No. 9, pp. 363-372, Sep. 2019
https://doi.org/10.3745/KTSDE.2019.8.9.363, Full Text:
Keywords: Sequence Similarity Measure, Transaction Data Analysis, Product Taxonomy, Levenshtein Distance, Dynamic time warping

A sequence refers to data in which the order exists on the two items, and purchase transaction data in which the products purchased by one customer are listed is one of the representative sequence data. In general, all goods have a product taxonomy, such as category/ sub-category/ sub-sub category, and if they are similar to each other, they are classified into the same category according to their characteristics. Therefore, in this paper, we not only consider the purchase order of products to compare two purchase transaction sequences, but also calculate their similarity by giving a higher score if they are in the same category in spite of their difference. Especially, in order to choose the best similarity measure that directly affects the calculation performance of the purchase transaction sequences, we have compared the performance of three representative similarity measures, the Levenshtein distance, dynamic time warping distance, and the Needleman-Wunsch similarity. We have extended the existing methods to take into account the product taxonomy. For conventional similarity measures, the comparison of goods in two sequences is calculated by simply assigning a value of 0 or 1 according to whether or not the product is matched. However, the proposed method is subdivided to have a value between 0 and 1 using the product taxonomy tree to give a different degree of relevance between the two products, even if they are different products. Through experiments, we have confirmed that the proposed method was measured the similarity more accurately than the previous method. Furthermore, we have confirmed that dynamic time warping distance was the most suitable measure because it considered the degree of association of the product in the sequence and showed good performance for two sequences with different lengths.

Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.

Cite this article
[IEEE Style]
Y. Yang and K. Y. Lee, "Purchase Transaction Similarity Measure Considering Product Taxonomy," KIPS Transactions on Software and Data Engineering, vol. 8, no. 9, pp. 363-372, 2019. DOI: https://doi.org/10.3745/KTSDE.2019.8.9.363.

[ACM Style]
Yu-Jeong Yang and Ki Yong Lee. 2019. Purchase Transaction Similarity Measure Considering Product Taxonomy. KIPS Transactions on Software and Data Engineering, 8, 9, (2019), 363-372. DOI: https://doi.org/10.3745/KTSDE.2019.8.9.363.