Featured links:
- Big Data: SSD’s, R, and Linked Data Streams [1]
- Data Mining and Warehousing blog [2]
Linki na pracownię:
- Attach:zadania.pdf
- Wprowadzenie do języków Octave i R
- Zadanie1 (eksploracyjna analiza danych)
- Zadanie2 (rozkład normalny i symulacja/generowanie)
- Zadanie3 (wielojądrowe SVMy)
- Support Vector Machines, Tutorial Slides by Andrew Moore [3], Support Vector and Kernel Machines [4]
- Sparse and large-scale learning with heterogeneous data [5]
- SVM-Tutorial using R (e1071-package) [6]
- Shogun - A Large Scale Machine Learning Toolbox [7]
- Zadanie4 (sieci Bayesowskie i reguły asocjacyjne)
- Short Overview of Bayes Nets [8], Inference in Bayesian Networks [10] Learning Bayesian Networks [9]: Tutorial Slides by Andrew Moore
- Bayes Net Toolbox for Matlab [11] Written by Kevin Murphy, 1997—2002: A Brief Introduction to Graphical Models and Bayesian Networks [12], How to use the toolbox [13]
- Zadanie5 (klasyfikacja, drzewa decyzyjne)
- Statistics with R: Rozdział 12 [14] Using regression in a classification problem, Nearest Neighbours, Naive Bayes classifier
- A Practical Guide to Support Vector Classification [15]
- An Introduction to Recursive Partitioning Using the RPART Routines [16] (wersja skrócona [17] — bez opisu teoretycznego)
- Classification and regression by randomForest [18] R News, 3, 2002
- KLIMT Making trees interactive [19] (Klassification - Interactive Methods for Trees)
- Zadanie6 (grupowanie, metody hierarchiczne, ocena ilości skupień)
- Statistics with R: Clustering [20]
- Zadanie7 (wizualizacja danych)
- Statistics with R: Factorial methods: Around Principal Component Analysis (PCA) [21]
- PCA by Projection Pursuit. The Package pcaPP [22] Heinrich Fritz
- Dimensional Reduction for Data Mapping [23] (Jonathan Edwards and Paul Oman) R News, 3, 2003
- Zadanie8 (szeregi czasowe)
- Statistics with R: Time series [24]
- Data mining on time series: an illustration using fast-food restaurant franchise data [25]
- Zadanie9 (“niestandardowe” bazy danych)
- Querying RDF Data from a Graph Database Perspective [26] (Renzo Angles and Claudio Gutierrez), Querying from a Graph Database Perspective: the case of RDF [27] — prezentacja
- Tethering Cultural Data with RDF [28] (Kate Byrne)
- RelEx is an English-language semantic relationship extractor, built on the Carnegie-Mellon link parser
- Bazy danych lub “interfejsy” do baz danych:
- Sesame [29] is an open source framework for storage, inferencing and querying of RDF data.
- Mondrian [30] is an OLAP server written in Java. It enables you to interactively analyze very large datasets stored in SQL databases without writing SQL.
- the Exist XML database [31]
- the RDF/SPARQL/XML part of the OpenLink Virtuoso system [32]
- the Jena/ARQ combo [33]
- HyperGraphDB [34]
- RDF Gateway [35] (komercyjne)
- 4Suite: an open-source platform for XML and RDF processing [36]
- Rx4RDF [37]
Other links for the class:
- Data Mining with R: learning by case studies [38] (by Luis Torgo), nice tutorial with SQL interaction
- Statistics with R [39], a very interesting but unpolished course on R
- Introduction to R and Exploratory data analysis [40]
- Using R for Introductory Statistics [41] (pre-draft of a published book)
- Using R for data analysis and graphics [42] (JH Maindonald)
- from Introduction to R [43] (a collection of links by Karl W Broman)
- Learning classifiers from distributed, semantically heterogeneous, autonomous data sources [44]
Propozycje referatów (w trakcie opracowywania):
- Komunikacja z bazą danych z poziomu środowiska R. [zajęty]
- Automatyczne generowanie raportów w środowisku R. [wolny]
- Tematy z Learning classifiers from distributed, semantically heterogeneous, autonomous data sources [45]: [wszystkie wolne]
- Rozdział trzeci: “Learning classifiers from distributed data”,
- Rozdział czwarty: “Learning classifiers from sementically heterogeneous data”,
- W razie zainteresowania możemy zrobić więcej.
- Tematy z książki “The Handbook of Data Mining”. Można wybrać jeden rozdział lub kilka powiązanych rozdziałów. [wszystkie wolne]
- 13 Distributed Data Mining 341 (Byung-Hoon Park and Hillol Kargupta)
- II: MANAGEMENT OF DATA MINING 14 Data Collection, Preparation, Quality, and Visualization 365 (Dorian Pyle)
- 15 Data Storage and Management 393 (Tong (Teresa) Wu and Xiangyang (Sean) Li)
- 16 Feature Extraction, Selection, and Construction 409 (Huan Liu, Lei Yu, and Hiroshi Motoda)
- 17 Performance Analysis and Evaluation 425 (Sholom M. Weiss and Tong Zhang)
- 18 Security and Privacy 441 (Chris Clifton)
- 19 Emerging Standards and Interfaces 453 (Robert Grossman, Mark Hornick, and Gregor Meyer)
- III: APPLICATIONS OF DATA MINING 20 Mining Human Performance Data 463 (David A. Nembhard)
- 21 Mining Text Data 481 (Ronen Feldman)
- 22 Mining Geospatial Data 519 (Shashi Shekhar and Ranga Raju Vatsavai)
- 23 Mining Science and Engineering Data 549 (Chandrika Kamath)
- 24 Mining Data in Bioinformatics 573 (Mohammed J. Zaki)
- 25 Mining Customer Relationship Management (CRM) Data 597 (Robert Cooley)
- 26 Mining Computer and Network Security Data 617 (Nong Ye)
- 27 Mining Image Data 637 (Chabane Djeraba and Gregory Fernandez)
- 28 Mining Manufacturing Quality Data 657 (Murat C. Testik and George C. Runger)
WARNING: Old stuff below.
Some random links for now:
- Information Retrieval (Searching the Web) [46], Tomasz Jurdzinski course and links from there
- Picture languages in machine understanding of medical visualization [47], Marek R. Ogiela, Ryszard Tadeusiewicz
- Learning the structure of image collections with latent aspect models [48], Florent Monay
- Recommendation Based on Personal Preference [49], Pei Wang. (Asking a database questions like: “give me five best examples of fast and cheap notebook”, “find cheap ticket for a flight that leaves C around 9AM and arives at D as early as possible”.)
- Sparse and large-scale learning with heterogeneous data [50], Google Tech Talk
- FilterBoost: Regression and Classification on Large Datasets [51], Joseph K. Bradley and Robert E. Schapire
Software:
- Vowpal Wabbit (Fast Online Learning) [52]
Places:
Amusements:
- Wikipedia:Petabyte#Petabytes_in_use
- Arstechnica: Rumors suggest Google is set to open scientific data store [54]
- Wired: This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize [55]