Piotr Lipinski, Computational Intelligence Research Group, Institute of Computer Science, University of Wroclaw

Advanced Data Mining

News:

Information:

This year, due to some possible absences related to research visits, our lecture will be conducted collectively with my PhD students, Klaudia Balcer, Mikołaj Słupiński and Maria Szlasa. Each of us will be responsible for a part of the lecture and both laboratory groups.

NEW Projects:

In order to present your project, please contact the lecturer responsible for your project. You may present your project during lab sessions, office hours or in some additional time slots. We propose time slots for project presentations on Monday, June 16 and on Monday, June 30. Please book the time slot in Google Calendar here. You may also ask us for additional time slots by email, if needed.

NEW Mini-talks:

We propose an additional (non-obligatory) meeting for mini-talk presentations on Monday, June 16. If you are interested in preparing a mini-talk (see the list of available topics below), please contact me by email and book the time slot in Google Calendar here.

Exam/Project:

Students who will complete their project at at least 50% of points may be exempt from the written exam (with the same grade as from labs). In this case, a few additional questions on outside the project may be asked during the project presentation.

Scores:

Scores are published in SKOS (login required).

Assignments:

List of assignments 1 PDF (deadline March 28, but each assignment presented by March 21 gives 1 bonus point)

- Jupyter Python notebook with Introduction to Time Series Clustering HTML IPYNB

- Data for the assignments: ZIP (the password is the title of our lecture written in lowercase without spaces - due to the copyright, please do not publish the data and do not use them for purposes unrelated to our lecture)

List of assignments 2 PDF (deadline April 4, but each assignment presented by March 28 gives 1 bonus point)

- Jupyter Python notebook with Bid and Ask Reconstruction HTML IPYNB

List of assignments 3 PDF (deadline April 16 - please remember that April 9 is Friday)

List of assignments 4 PDF (deadline May 9 - due to the changes in the academic calendar concerning May 9 and May 5, the Friday group may send the solutions by email by May 9 and present them on the next meeting)

List of assignments 5 PDF (deadline May 23, but each assignment presented by May 16 gives 1 bonus point)

List of assignments 6 PDF (deadline June 6)

List of assignments 7 PDF (deadline June 13)

Project on Advanced Data Mining:

Project on Advanced Data Mining PDF

Mini-talks:

I propose the following topics for mini-talks. Please contact me, if you are interested in preparing such a mini-talk. It should last about 15 minutes and present the topic in more or less details (depending on the particular topic and the possibility of summarizing it in 15 minutes). We can organize an additional meeting with a session of mini-talks or make them during regular labs.

1. Soft-DTW distance PDF

2. What is Zero Shot Learning? What is One Shot Learnig? Definition, Selected Methods and Examples.

3. Bayesian probabilistic matrix factorization using Markov Chain Monte Carlo. LINK

4. SDNN: Symmetric deep neural networks with lateral connections for recommender systems LINK

5. Oops I Took A Gradient: Scalable Sampling for Discrete Distributions PDF

6. Hamiltonian Monte Carlo LINK

7. Changepoint Detection PDF

8. Time2Graph PDF

9. TOOLS: Orange Data Mining (requires a short demo) LINK

10. TOOLS: Yellowbrick: Machine Learning Visualization (requires a short demo) LINK

11. TOOLS: SHAP (requires a short demo) LINK

12. APPLICATIONS: RecoMed: A knowledge-aware recommender system for hypertension medications LINK

Lecture presentations:

Introduction PDF

Predicting Time Series Data PDF

Jupyter Python notebook with Introduction to Time Series Prediction on Airline Passengers HTML IPYNB

Additional materials: LINK1 LINK2 LINK3 LINK4

Time Series Clustering PDF

Jupyter Python notebook with Introduction to Time Series Clustering HTML IPYNB

Time Series Classification with Shapelets PDF

Introduction to Recommender Systems PDF

Matrix Factorization for Recommender Systems PDF

Sequential and Session-based Recommender Systems PDF

Deep Learning for Geospatial Data PDF

Self-Supervised Learning - Pretext Tasks PDF

Labs:

Session 1 - mini-course on advanced scientific python: Numpy IPYNB, Pandas and Dask IPYNB, SciKit Pipeline, Scikit GridSearch and Optuna IPYNB, Weights and Biases IPYNB.

Session 2 - mini-course on advanced scientific python: multiprocessing IPYNB, Numba, JIT, CUDA IPYNB, Pytorch IPYNB, Tensorboard IPYNB. Visualizations IPYNB.

Content (a general overview in a draft version and in a slightly random order):

1. Time Series and Temporal Data in Computer Science Perspective: clustering, classification, forecasting

2. Time Series and Temporal Data in Probabilistic Perspective: autoregressive models

3. Time Series and Temporal Data in Deep Learning Perspective: Recurrent Neural Networks, Transformers, Representation Learning, etc. (e.g. T-Loss, TST, TNC, TS2Vec, TRep)

4. Deep Learning for Geospatial Data: satellite image segmentation, satellite image time series prediction, etc. (e.g. UNet, Swin-UNet, SatMAE, ViTs, ViTs for SITS, Presto)

5. Recommender Systems: Collaborative Filtering, Matrix Factorization, Sequential and Session-based Recommender Systems (e.g. LightFM, NCF, SRGNN, TAGNN, LightGCN, DiffuASR)

6. Self- and Semi-Supervised Learning: contrastive learning, masked autoencoders, data augmentation, etc.

7. State Space Models: HMM, SLDS, Kalman Filters

8. Deep Learning in Graphs: GNN, GCN, Temporal Graph Networks, Deep Learning on Dynamic Graphs

9. Dimensionality Reduction, Representation Learning, etc.

10. Applications