This paper proposes SCHEMA, an algorithm for automated mapping between heterogeneous product taxonomies in the e-commerce domain. SCHEMA utilises word sense disambiguation techniques, based on the ideas from the algorithm proposed by Lesk, in combination with the semantic lexicon WordNet. For finding candidate map categories and determining the path-similarity we propose a node matching function that is based on the Levenshtein distance. The final mapping quality score is calculated using the Damerau-Levenshtein distance and a node-dissimilarity penalty. The performance of SCHEMA was tested on three real-life datasets and compared with PROMPT and the algorithm proposed by Park & Kim. It is shown that SCHEMA improves considerably on both recall and F1-score, while maintaining similar precision.
|Number of pages||15|
|Publication status||Published - 27 May 2012|
|Event||The Interface for Dutch ICT-Research 2012 (ICT.OPEN 2012) - Rotterdam, the Netherlands|
Duration: 22 Oct 2012 → 23 Oct 2012
|Conference||The Interface for Dutch ICT-Research 2012 (ICT.OPEN 2012)|
|City||Rotterdam, the Netherlands|
|Period||22/10/12 → 23/10/12|