11:15am to 12:15pm |
|
Mining Useful Patterns
(Seminar/Conference)
The speaker is Dr. Jilles Vreeken from the Department of Mathematics and Computer Science at the University of Antwerp, Belgium. The abstract of his talk is as follows: In short, this talk will be about how to find interesting patterns, and how to put these to good use in a variety of data mining tasks, beating the competition without having to set parameters - all by employing insights from information theory. Pattern mining is a very powerful tool in exploratory data analysis. Given some dataset, the standard question is 'find me all patterns that are potentially interesting'. In practice, however, you will not want to ask that question. Typically, for any non-trivial interestingness-threshold, there will exist far too many such patterns, orders more than the size of the dataset. Moreover, most of these results will be redundant, being only variations of a theme. As such, finding the true nuggets amongst these becomes like finding the proverbial needle in the haystack. As such, instead, you should ask 'find me the optimal set of patterns', where optimal should value small groups, low redundancy, and high-quality patterns. This is where information theory comes in. It gives us a principled way to formalise 'optimal' for our goal. Namely, we can use it to identify those patterns that describe the data best, or, that do the best job at predicting the data. I will give a quick overview of the algorithms I have (co-)developed to this end to identify high quality pattern sets on binary data. I will give a number of examples on how the resulting patterns can be put to good use in tasks including classification, one-class classification, anomaly detection, missing value estimation, concept-drift detection, and clustering - obtaining top-notch, highly interpretable results, without having to set any parameters.
|