Learning from data streams

ABSTRACT

Nowadays, there are applications in which the data are modeled best not as persistent tables, but rather as transient data streams. In this talk, we discuss the limitations of current machine learning and data mining algorithms. We discuss the fundamental issues in learning in dynamic environments like continuously maintain learning models that evolve over time, learning and forgetting, concept drift and change detection. Data streams produce a huge amount of data that introduce new constraints in the design of learning algorithms: limited computational resources in terms of memory, cpu power, and communication bandwidth. We present some illustrative algorithms, designed to taking these constrains into account, for decision-tree learning, hierarchical clustering and frequent pattern mining. We identify the main issues and current challenges that emerge in learning from data streams that open research lines for further developments.

References

  • Jo ̃ao Gama, Knowledge Discovery from Data Streams, Chapman & Hall/CRC Press, 2010. (Data Mining and Knowledge Discovery Series)
  • Jo ̃ao Gama, Indre Zliobaite, Albert Bifet, Mykola Pechenizkiy, A Survey on Concept Drift Adaptation, ACM Computing Surveys, volume 46, number 4, pp 40; 2014.
  • Jo ̃ao Gama, A Survey on Learning from Data Streams: Current and Future Trends, Progress in Artificial Intelligence, Vol. 1, Nr. 1, Springer, 2012