Advanced Data Mining
News:
Information:
Please remember that Wednesday, April 9 is Friday schedule. We will have a lecture, but I suggest to cancel our labs, and move them at the end of the semester to discuss projects and other issues.
Information:
This year, due to some possible absences related to research visits, our lecture will be conducted collectively with my PhD students, Klaudia Balcer, Mikołaj Słupiński and Maria Szlasa. Each of us will be responsible for a part of the lecture and both laboratory groups.
Organizational Issues:
During the first two weeks, there will be an introductory mini-tutorial of advanced Python for AI/ML in the labs. You will be able to learn about internal data representations, vector and matrix computations, parallel CPU and GPU computations, data analysis and manipulation, organizing AI/ML data processing workflow, optimizing hyperparameters, tracking machine learning experiments, and more. Good opportunity to practice useful techniques and tools!
Scores:
Scores are published in SKOS (login required).
Assignments:
List of assignments 1 PDF (deadline March 28, but each assignment presented by March 21 gives 1 bonus point)
- Jupyter Python notebook with Introduction to Time Series Clustering HTML IPYNB
- Data for the assignments: ZIP (the password is the title of our lecture written in lowercase without spaces - due to the copyright, please do not publish the data and do not use them for purposes unrelated to our lecture)
List of assignments 2 PDF (deadline April 4, but each assignment presented by March 28 gives 1 bonus point)
- Jupyter Python notebook with Bid and Ask Reconstruction HTML IPYNB
- Data for the assignments: ZIP (the password is the title of our lecture written in lowercase without spaces - due to the copyright, please do not publish the data and do not use them for purposes unrelated to our lecture)
List of assignments 3 PDF (deadline April 16 - please remember that April 9 is Friday)
Lecture presentations:
Predicting Time Series Data PDF
Jupyter Python notebook with Introduction to Time Series Prediction on Airline Passengers HTML IPYNB
Additional materials: LINK1 LINK2 LINK3 LINK4
Jupyter Python notebook with Introduction to Time Series Clustering HTML IPYNB
Time Series Classification with Shapelets PDF
Labs:
Session 1 - mini-course on advanced scientific python: Numpy IPYNB, Pandas and Dask IPYNB, SciKit Pipeline, Scikit GridSearch and Optuna IPYNB, Weights and Biases IPYNB.
Session 2 - mini-course on advanced scientific python: multiprocessing IPYNB, Numba, JIT, CUDA IPYNB, Pytorch IPYNB, Tensorboard IPYNB. Visualizations IPYNB.
Content (a general overview in a draft version and in a slightly random order):
1. Time Series and Temporal Data in Computer Science Perspective: clustering, classification, forecasting
2. Time Series and Temporal Data in Probabilistic Perspective: autoregressive models
3. Time Series and Temporal Data in Deep Learning Perspective: Recurrent Neural Networks, Transformers, Representation Learning, etc. (e.g. T-Loss, TST, TNC, TS2Vec, TRep)
4. Deep Learning for Geospatial Data: satellite image segmentation, satellite image time series prediction, etc. (e.g. UNet, Swin-UNet, SatMAE, ViTs, ViTs for SITS, Presto)
5. Recommender Systems: Collaborative Filtering, Matrix Factorization, Sequential and Session-based Recommender Systems (e.g. LightFM, NCF, SRGNN, TAGNN, LightGCN, DiffuASR)
6. Self- and Semi-Supervised Learning: contrastive learning, masked autoencoders, data augmentation, etc.
7. State Space Models: HMM, SLDS, Kalman Filters
8. Deep Learning in Graphs: GNN, GCN, Temporal Graph Networks, Deep Learning on Dynamic Graphs
9. Dimensionality Reduction, Representation Learning, etc.
10. Applications