Recent Changes · Search:

Functional Programming

Type Inference

Toss

  • (incorporates former Speagram)

Emacs

Kurs Pascala

Artificial General Intelligence

AI:

Algorithmic Game Theory: Prediction Markets (po polsku)

Programming in Java

kurs pracy w systemie Linux

Evolutionary Algorithms

Animation

Data Stores and Data Mining

Language Understanding

Systemy Inteligentnych Agentów

Przetwarzanie Języka Naturalnego

Programowanie Funkcjonalne

PmWiki

pmwiki.org

add user

edit SideBar

DataMining.DataMining History

Hide minor edits - Show changes to output

May 21, 2009, at 02:54 PM by lukstafi - big data
Added line 2:
* [[http://radar.oreilly.com/2009/05/big-data-analytics-r-linked-data-ssd.html | Big Data: SSD's, R, and Linked Data Streams]]
February 06, 2009, at 12:56 AM by lukstafi - DM blog
Changed lines 1-3 from:
to:
Featured links:
* [[http://dataminingwarehousing.blogspot.com/ | Data Mining and Warehousing blog]]

Changed line 47 from:
* [[http://www.liaad.up.pt/~ltorgo/DataMiningWithR/ | Data Mining with R: learning by case studies]] (by Luis Torgo), seems excellent!
to:
* [[http://www.liaad.up.pt/~ltorgo/DataMiningWithR/ | Data Mining with R: learning by case studies]] (by Luis Torgo), nice tutorial with SQL interaction
September 19, 2008, at 02:21 PM by lukstafi - link to a great tutorial
Added lines 45-55:

Other links for the class:
* [[http://www.liaad.up.pt/~ltorgo/DataMiningWithR/ | Data Mining with R: learning by case studies]] (by Luis Torgo), seems excellent!
* [[http://zoonek2.free.fr/UNIX/48_R/all.html | Statistics with R]], a very interesting but unpolished course on R
* [[http://cc.oulu.fi/~jarioksa/opetus/metodi/eda.pdf | Introduction to R and Exploratory data analysis]]
* [[http://www.math.csi.cuny.edu/Statistics/R/simpleR/index.html | Using R for Introductory Statistics]] (pre-draft of a published book)
* [[http://cran.r-project.org/doc/contrib/usingR.pdf | Using R for data analysis and graphics]] (JH Maindonald)
** from [[http://www.biostat.wisc.edu/~kbroman/Rintro/ | Introduction to R]] (a collection of links by Karl W Broman)
* [[http://www.cs.iastate.edu/~honavar/Papers/caragea-thesis.pdf | Learning classifiers from distributed, semantically heterogeneous, autonomous data sources]]

Deleted lines 81-91:

Other links for the class:
* [[http://www.cs.iastate.edu/~honavar/Papers/caragea-thesis.pdf | Learning classifiers from distributed, semantically heterogeneous, autonomous data sources]]
* [[http://zoonek2.free.fr/UNIX/48_R/all.html | Statistics with R]], a very interesting course on R
* [[(Wikipedia:)Exploratory data analysis]]
* [[http://octave.sourceforge.net/doc/funref_statistics.html | Extra statistical functions for Octave]], among them @@boxplot@@.
* [[http://cc.oulu.fi/~jarioksa/opetus/metodi/eda.pdf | Introduction to R and Exploratory data analysis]]
* [[http://www.math.csi.cuny.edu/Statistics/R/simpleR/index.html | Using R for Introductory Statistics]] (pre-draft of a published book)
* [[http://cran.r-project.org/doc/contrib/usingR.pdf | Using R for data analysis and graphics]] (JH Maindonald)
** from [[http://www.biostat.wisc.edu/~kbroman/Rintro/ | Introduction to R]] (a collection of links by Karl W Broman)

Deleted line 4:
* [[Projekt, dr Leszek Grocholski -> Attach:projektHD.doc]]
June 20, 2008, at 09:45 AM by lukstafi - time series link
Added line 30:
** [[http://www.scausa.com/DataMiningOnTimeSeries2.pdf |  Data mining on time series: an illustration using fast-food restaurant franchise data]]
June 20, 2008, at 03:48 AM by lukstafi - RDFowe bazy danych
Deleted line 30:
** [[http://mondrian.pentaho.org/ | Mondrian]] is an OLAP server written in Java. It enables you to interactively analyze very large datasets stored in SQL databases without writing SQL.
Changed lines 32-33 from:

to:
** [[http://www.ltg.ed.ac.uk/np/publications/ltg/papers/Byrne2006Tethering.pdf | Tethering Cultural Data with RDF]] (Kate Byrne)
*** [[http://www.opencog.org/wiki/RelEx | RelEx]] is an English-language semantic relationship extractor, built on the Carnegie-Mellon link parser
** Bazy danych lub "interfejsy" do baz danych:
*** [[http://openrdf.org/ | Sesame]] is an open source framework for storage, inferencing and querying of RDF data.
*** [[http://mondrian.pentaho.org/ | Mondrian]] is an OLAP server written in Java. It enables you to interactively analyze very large datasets stored in SQL databases without writing SQL.
*** [[the Exist XML database -> http://exist.sourceforge.net/]]
*** [[the RDF/SPARQL/XML part of the OpenLink Virtuoso system -> http://sourceforge.net/projects/virtuoso/]]
*** [[the Jena/ARQ combo -> http://jena.sourceforge.net/ARQ/]]
*** [[http://www.kobrix.com/hgdb.jsp | HyperGraphDB]]
*** [[http://www.intellidimension.com/ | RDF Gateway]] (komercyjne)
*** [[http://4suite.org/index.xhtml | 4Suite: an open-source platform for XML and RDF processing]]
*** [[http://rx4rdf.liminalzone.org/ | Rx4RDF]]

June 20, 2008, at 02:51 AM by lukstafi - RDF and graph databases
Changed lines 32-33 from:

to:
** [[http://www.dcc.uchile.cl/~cgutierr/papers/eswc05.pdf | Querying RDF Data from a Graph Database Perspective]] (Renzo Angles and Claudio Gutierrez), [[http://www.ciw.cl/material/irw-2005/2005-irw-gutierrez.pdf | Querying from a Graph Database Perspective: the case of RDF]] -- prezentacja

June 06, 2008, at 07:57 AM by lukstafi - projection pursuit PCA
Added lines 25-27:
** [[http://zoonek2.free.fr/UNIX/48_R/05.html | Statistics with R: Factorial methods: Around Principal Component Analysis (PCA)]]
** [[http://www.r-project.org/useR-2006/Slides/Fritz.pdf | PCA by Projection Pursuit. The Package pcaPP]] Heinrich Fritz
** [[http://cran.r-project.org/doc/Rnews/Rnews_2003-3.pdf | Dimensional Reduction for Data Mapping]] (Jonathan Edwards and Paul Oman) R News, 3, 2003
Changed line 18 from:
** [[http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf | A Practical Guide to Support Vector Classification]]
to:
** [[http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf | A Practical Guide to Support Vector Classification]]
May 30, 2008, at 04:19 PM by lukstafi - SVM guide
Added line 18:
** [[http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf | A Practical Guide to Support Vector Classification]]
May 23, 2008, at 03:07 AM by lukstafi - rpart
Added line 18:
** [[http://www.mayo.edu/hsr/techrpt/61.pdf | An Introduction to Recursive Partitioning Using the RPART Routines]] ([[http://ndc.mayo.edu/mayo/research/biostat/upload/rpartmini.pdf | wersja skrócona]] -- bez opisu teoretycznego)
May 21, 2008, at 06:45 PM by lukstafi - PCA zadanie
Changed lines 22-23 from:
* [[Zadanie7]] (szeregi czasowe)
to:
* [[Zadanie7]] (wizualizacja danych)
* [[Zadanie8
]] (szeregi czasowe)
Changed line 25 from:
* [[Zadanie8]] ("niestandardowe" bazy danych)
to:
* [[Zadanie9]] ("niestandardowe" bazy danych)
May 21, 2008, at 06:31 PM by lukstafi - plan zadan
Changed lines 6-8 from:
* [[Zadanie1]]
* [[Zadanie2]]
* [[Zadanie3]]
to:
* [[Zadanie1]] (eksploracyjna analiza danych)
*
[[Zadanie2]] (rozkład normalny i symulacja/generowanie)
*
[[Zadanie3]] (wielojądrowe SVMy)
Changed line 13 from:
* [[Zadanie4]]
to:
* [[Zadanie4]] (sieci Bayesowskie i reguły asocjacyjne)
Changed line 16 from:
* [[Zadanie5]]
to:
* [[Zadanie5]] (klasyfikacja, drzewa decyzyjne)
Changed lines 20-27 from:
to:
* [[Zadanie6]] (grupowanie, metody hierarchiczne, ocena ilości skupień)
** [[http://zoonek2.free.fr/UNIX/48_R/06.html | Statistics with R: Clustering]]
* [[Zadanie7]] (szeregi czasowe)
** [[http://zoonek2.free.fr/UNIX/48_R/15.html | Statistics with R: Time series]]
* [[Zadanie8]] ("niestandardowe" bazy danych)
** [[http://mondrian.pentaho.org/ | Mondrian]] is an OLAP server written in Java. It enables you to interactively analyze very large datasets stored in SQL databases without writing SQL.

May 16, 2008, at 07:38 AM by lukstafi - zadanie 5 -- klasyfikacja
Changed lines 16-20 from:
to:
* [[Zadanie5]]
** [[http://zoonek2.free.fr/UNIX/48_R/12.html | Statistics with R: Rozdział 12]] Using regression in a classification problem, Nearest Neighbours, Naive Bayes classifier
** [[http://cran.r-project.org/doc/Rnews/Rnews_2002-3.pdf | Classification and regression by randomForest]] R News, 3, 2002
** [[http://stats.math.uni-augsburg.de/Klimt/features.html |  KLIMT Making trees interactive]] (Klassification - Interactive Methods for Trees)

Changed line 24 from:
# Tematy z książki "The Handbook of Data Mining"
to:
# Tematy z książki "The Handbook of Data Mining". Można wybrać jeden rozdział lub kilka powiązanych rozdziałów. [wszystkie wolne]
Changed line 35 from:
23 Mining Science and Engineering Data                          549 (Chandrika Kamath)
to:
## 23 Mining Science and Engineering Data                          549 (Chandrika Kamath)
April 25, 2008, at 12:32 PM by lukstafi - referaty handbook
Deleted lines 16-17:
**
Changed lines 24-43 from:
to:
# Tematy z książki "The Handbook of Data Mining"
## 13 Distributed Data Mining                                          341 (Byung-Hoon Park and Hillol Kargupta)
## II: MANAGEMENT OF DATA MINING 14 Data Collection, Preparation, Quality, and Visualization        365 (Dorian Pyle)
## 15 Data Storage and Management                    393 (Tong (Teresa) Wu and Xiangyang (Sean) Li)
## 16 Feature Extraction, Selection, and Construction 409 (Huan Liu, Lei Yu, and Hiroshi Motoda)
## 17 Performance Analysis and Evaluation            425 (Sholom M. Weiss and Tong Zhang)
## 18 Security and Privacy                                                    441 (Chris Clifton)
## 19 Emerging Standards and Interfaces                                        453 (Robert Grossman, Mark Hornick, and Gregor Meyer)
## III: APPLICATIONS OF DATA MINING 20 Mining Human Performance Data                        463 (David A. Nembhard)
## 21 Mining Text Data                                      481 (Ronen Feldman)
## 22 Mining Geospatial Data                                        519 (Shashi Shekhar and Ranga Raju Vatsavai)
23 Mining Science and Engineering Data                          549 (Chandrika Kamath)
## 24 Mining Data in Bioinformatics                          573 (Mohammed J. Zaki)
## 25 Mining Customer Relationship Management (CRM) Data    597 (Robert Cooley)
## 26 Mining Computer and Network Security Data              617 (Nong Ye)
## 27 Mining Image Data                            637 (Chabane Djeraba and Gregory Fernandez)
## 28 Mining Manufacturing Quality Data            657 (Murat C. Testik and George C. Runger)


April 25, 2008, at 07:07 AM by lukstafi - BNT, zadanie4
Changed lines 13-18 from:
to:
* [[Zadanie4]]
** [[http://www.autonlab.org/tutorials/shortbayes.html | Short Overview of Bayes Nets]], [[Inference in Bayesian Networks -> http://www.autonlab.org/tutorials/bayesinf.html]] [[http://www.autonlab.org/tutorials/bayesstruct.html | Learning Bayesian Networks]]: Tutorial Slides by Andrew Moore
** [[http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html | Bayes Net Toolbox for Matlab]] Written by Kevin Murphy, 1997--2002: [[http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html | A Brief Introduction to Graphical Models and Bayesian Networks]], [[http://www.cs.ubc.ca/~murphyk/Software/BNT/usage.html | How to use the toolbox]]

**

April 11, 2008, at 10:27 PM by lukstafi - referaty początek
Changed lines 14-21 from:
to:
Propozycje referatów (w trakcie opracowywania):
# Komunikacja z bazą danych z poziomu środowiska R. [zajęty]
# Automatyczne generowanie raportów w środowisku R. [wolny]
# Tematy z [[http://www.cs.iastate.edu/~honavar/Papers/caragea-thesis.pdf | Learning classifiers from distributed, semantically heterogeneous, autonomous data sources]]: [wszystkie wolne]
## Rozdział trzeci: "Learning classifiers from distributed data",
## Rozdział czwarty: "Learning classifiers from sementically heterogeneous data",
## W razie zainteresowania możemy zrobić więcej.

April 11, 2008, at 09:39 PM by lukstafi - learning from distributed sources
Added line 16:
* [[http://www.cs.iastate.edu/~honavar/Papers/caragea-thesis.pdf | Learning classifiers from distributed, semantically heterogeneous, autonomous data sources]]
Added lines 25-26:
WARNING: Old stuff below.
April 11, 2008, at 04:30 AM by lukstafi - SVMs, learning with heterogeneous data
Changed lines 9-14 from:
to:
** [[http://www.autonlab.org/tutorials/svm15.pdf | Support Vector Machines, Tutorial Slides by Andrew Moore]], [[http://www.support-vector.net/icml-tutorial.pdf | Support Vector and Kernel Machines]]
** [[http://video.google.pl/videoplay?docid=4867582015325197740 | Sparse and large-scale learning with heterogeneous data]]
** [[http://www.potschi.de/svmtut/svmtut.html | SVM-Tutorial using R (e1071-package)]]
** [[http://www.shogun-toolbox.org/ | Shogun - A Large Scale Machine Learning Toolbox]]

Added line 16:
* [[http://zoonek2.free.fr/UNIX/48_R/all.html | Statistics with R]], a very interesting course on R
April 04, 2008, at 08:04 AM by lukstafi - HD projekt
Added line 5:
* [[Projekt, dr Leszek Grocholski -> Attach:projektHD.doc]]
April 04, 2008, at 02:35 AM by lukstafi - zad3 real soon now
Changed lines 7-8 from:
to:
* [[Zadanie3]]
March 28, 2008, at 01:21 AM by lukstafi - zadanie 2
Changed lines 1-8 from:
Links for the class:
to:

Linki na pracownię:
* [[Attach:zadania.pdf]]
* [[Attach:introOctaveR.pdf | Wprowadzenie do języków Octave i R]]
* [[Zadanie1]]
* [[Zadanie2]]

Other links
for the class:
Deleted lines 35-40:

Linki na pracownię:
* [[Attach:zadania.pdf]]
* [[Attach:introOctaveR.pdf | Wprowadzenie do języków Octave i R]]
* [[Zadanie1]]

March 20, 2008, at 06:00 AM by lukstafi - pracownia linki
Changed lines 30-34 from:
[[Attach:zadania.pdf]]
to:
Linki na pracownię:
* [[Attach:
zadania.pdf]]
* [[Attach:introOctaveR.pdf | Wprowadzenie do języków Octave i R]]
* [[Zadanie1]]

March 19, 2008, at 11:39 PM by lukstafi - R EDA link
Added line 4:
* [[http://cc.oulu.fi/~jarioksa/opetus/metodi/eda.pdf | Introduction to R and Exploratory data analysis]]
March 19, 2008, at 02:03 AM by lukstafi - R links
Changed lines 4-7 from:
to:
* [[http://www.math.csi.cuny.edu/Statistics/R/simpleR/index.html | Using R for Introductory Statistics]] (pre-draft of a published book)
* [[http://cran.r-project.org/doc/contrib/usingR.pdf | Using R for data analysis and graphics]] (JH Maindonald)
** from [[http://www.biostat.wisc.edu/~kbroman/Rintro/ | Introduction to R]] (a collection of links by Karl W Broman)

Changed lines 27-28 from:
to:
* [[http://www.wired.com/techbiz/media/magazine/16-03/mf_netflix?currentPage=1 | Wired: This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize]]
March 14, 2008, at 04:33 AM by lukstafi - extra stat for Octave
Changed lines 3-4 from:

to:
* [[http://octave.sourceforge.net/doc/funref_statistics.html | Extra statistical functions for Octave]], among them @@boxplot@@.
March 14, 2008, at 04:27 AM by lukstafi - exploratory data analysis
Added lines 1-4:
Links for the class:
* [[(Wikipedia:)Exploratory data analysis]]

Added lines 20-21:

[[Attach:zadania.pdf]]
January 26, 2008, at 08:56 PM by lukstafi - places
Added line 10:
Added lines 14-16:
Places:
* [[http://research.google.com/ | Google Research]]

January 25, 2008, at 07:41 PM by lukstafi - google datastore
Changed lines 14-15 from:
* [[Wikipedia:Petabyte#Petabytes_in_use]]
to:
* [[Wikipedia:Petabyte#Petabytes_in_use]]
* [[http://arstechnica.com/news.ars/post/20080122-rumors-suggest-google-is-set-to-open-scientific-data-store.html | Arstechnica: Rumors suggest Google is set to open scientific data store
]]
January 24, 2008, at 10:58 PM by lukstafi - petabytes
Added lines 12-14:

Amusements:
* [[Wikipedia:Petabyte#Petabytes_in_use]]
December 27, 2007, at 09:36 PM by lukstafi - learning library
Added lines 9-11:

Software:
* [[http://hunch.net/~vw/ | Vowpal Wabbit (Fast Online Learning)]]
December 21, 2007, at 11:15 PM by lukstafi - FilterBoost
Changed lines 7-8 from:
* [[http://video.google.com/videoplay?docid=4867582015325197740 | Sparse and large-scale learning with heterogeneous data]], Google Tech Talk
to:
* [[http://video.google.com/videoplay?docid=4867582015325197740 | Sparse and large-scale learning with heterogeneous data]], Google Tech Talk
* [[http://www.cs.cmu.edu/~jkbradle/ | FilterBoost: Regression and Classification on Large Datasets]], Joseph K. Bradley and Robert E. Schapire
October 03, 2007, at 01:39 AM by lukstafi - Sparse and large-scale learning with heterogeneous data
Changed lines 6-7 from:
* [[http://nars.wang.googlepages.com/wang.preference.pdf | Recommendation Based on Personal Preference]], Pei Wang. (Asking a database questions like: "give me five best examples of fast and cheap notebook", "find cheap ticket for a flight that leaves C around 9AM and arives at D as early as possible".)
to:
* [[http://nars.wang.googlepages.com/wang.preference.pdf | Recommendation Based on Personal Preference]], Pei Wang. (Asking a database questions like: "give me five best examples of fast and cheap notebook", "find cheap ticket for a flight that leaves C around 9AM and arives at D as early as possible".)
* [[http://video.google.com/videoplay?docid=4867582015325197740 | Sparse and large-scale learning with heterogeneous data]], Google Tech Talk
September 15, 2007, at 01:39 PM by lukstafi - Pei Wang preferences
Added line 6:
* [[http://nars.wang.googlepages.com/wang.preference.pdf | Recommendation Based on Personal Preference]], Pei Wang. (Asking a database questions like: "give me five best examples of fast and cheap notebook", "find cheap ticket for a flight that leaves C around 9AM and arives at D as early as possible".)
September 08, 2007, at 10:20 PM by lukstafi - IR, picture analysis
Added lines 1-5:
Some random links for now:

* [[http://www.ii.uni.wroc.pl/%7Etju/Wyszukiwanie07/wyszukiwanie07.html | Information Retrieval (Searching the Web)]], Tomasz Jurdzinski course and links from there
* [[http://portal.acm.org/citation.cfm?id=1133508 | Picture languages in machine understanding of medical visualization]], Marek R. Ogiela, Ryszard Tadeusiewicz
* [[http://library.epfl.ch/theses/?nr=3729 | Learning the structure of image collections with latent aspect models]], Florent Monay
Edit · History · Print · Recent Changes · Search · Links
Page last modified on May 21, 2009, at 02:54 PM