Sequence Mining Project



Introduction

Sequence Mining, also called frequent sequence mining, just as its name implies, is to find sequences(usually subsequences) that occures more often than others in a set of sequence.Different from association patterns, itemsets(or item) in sequence patterns is ordering.

Algorithms for sequence mining are various, such as SPAM, GSP and PrefixSpan. The PrefixSpan algorithm is a relative better choice for mining sequence patterns in a large set of sequences. With the usage of PrefixSpan algorithm, it's always convinient to divide big problems to many small subproblems. Thanks to this feature, you can easily run it in the popular hadoop platform as MapReduce program without many modifications.

I decide to apply PrefixSpan algorithm to implement sequence mining.



Breeze, 2010.11.11