| NAME | FILESIZE (KB) | DESCRIPTION | X TYPE | Y TYPE | NCASES | NDIMS | SOURCE |
|---|---|---|---|---|---|---|---|
| 04cars | 106 | 2004 New Car and Truck Data | mixed | discrete | 428 | 19 | Source: Kiplinger's Personal Finance, December 2003, vol. 57, no. 12, pp. 104-123 www.kiplinger.com |
| 14cancer | 39383 | Gene expression data for 14 types of cancer | cts | discrete | 144 | 16063 | Source: http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html |
| 20news_w100 | 242 | 100 words from 20newsgroups data 4 meta-groups | Sparse binary | discrete (4) | 16242 | 100 | Source: http://cs.nyu.edu/~roweis/data.html and http://people.csail.mit.edu/jrennie/20Newsgroups/ |
| 20news_w1000 | 26290 | Top 1000 words from the 20newsgroups data | Sparse count data | discrete (20) | 11256 | 1000 | Source: http://people.csail.mit.edu/jrennie/20Newsgroups/ Created by: Ben Marlin |
| adultCensus | 4654 | Adult census from UCI repository | mixed | binary | 45222 | 11 | Source: http://archive.ics.uci.edu/ml/datasets/Adult |
| alarmNetwork | 16 | The alarm monitoring system | Source: Beinlich, Ingo, H. J. Suermondt, R. M. Chavez, and G. F. Cooper (1989) "The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks" in Proc. of the Second European Conf. on Artificial Intelligence in Medicine (London, Aug.), 38, 247-256. | ||||
| amlAll | 2451 | Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression | discrete | binary | 38+34 | 7129 | Source: http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi Created by: processAmlAllData.m |
| anscombe | 4 | Four very different data sets all with the same p-value | discrete | cts | 11 | 1 | Source: Built in R data set |
| autompg | 28 | Auto fuel consumption in miles per gallon, from UCI repository | mixed | real | 392 | 8 | Source: http://archive.ics.uci.edu/ml/datasets/Auto+MPG Created by: autompgMakeData.m |
| bankruptcy | 5 | cts | binary | 66 | 2 | ||
| bayesFactorGeneData | 13 | A synthetic data set to test detection of differential expression between genes | cts | binary | 100 | 2 | Created by: bayesFactorMakeGeneData.m |
| binaryImages | 42 | Small binary images | 7 | 150x150 | Source: Random binarized images from the web | ||
| biscuits | 428 | cts | cts | ||||
| bishop2class | 13 | Classification data set from Bishop's Pattern Recognition and Machine Learning | cts | binary | 200 | 2 | Source: http://research.microsoft.com/en-us/um/people/cmbishop/prml/webdatasets/datasets.htm |
| bodyBrainWeight | 5 | Regression | 62 | 1 | |||
| car | 18 | Car evaluation database | binary | discrete | 1728 | 6 | Source: http://archive.ics.uci.edu/ml/datasets/Car+Evaluation |
| casinoData | 4 | Synthetic correlated dice rolls from an HMM model | Created by: casinoDemo.m | ||||
| caterpillar | 6 | Caterpillar nests data | cts | discrete | 33 | 10 | Source: http://www.ceremade.dauphine.fr/~xian/BCS/caterpillar |
| colon | 1141 | Princeton genomics Alon et al colon cancer data | cts | binary | 62 | 2000 | Source: http://genomics-pubs.princeton.edu/oncology/affydata/index.html Created by: processColonData.m |
| conjointAnalysisComputerBuyers | 186 | Computer survey data | |||||
| crabs | 35 | Leptograpsus crabs data | Classification | 80 | 5 | Source: http://www.stats.ox.ac.uk/pub/PRNN/ | |
| cubicData | 227 | Regression | 100 | 1 | |||
| dags4 | 7 | All 4-node directed acyclic graphs | 543 | Created by: mk_all_dags.m | |||
| dags5 | 270 | All 5-node directed acyclic graphs | 29281 | Created by: mk_all_dags.m | |||
| darwin | 3320 | The full text of Darwin's "On the origin of species" | 20844 lines | Source: http://www.gutenberg.org/etext/22764 | |||
| diabetes | 504 | ||||||
| digits3Htf | 1280 | Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service | 658 | 16x16 | Source: http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html | ||
| dna | 2 | DNA sequences | 10 | 30 | |||
| emailData | 2404 | Email classification data | Classification | 1500 | 10000 | ||
| facesCBCL | 3465 | ||||||
| facesOlivetti | 46026 | Images of faces | Classification | 400 | 64x64 | Source: http://www.cs.toronto.edu/~roweis/data.html | |
| facesYale | 54474 | ||||||
| failureTime | 2 | ||||||
| faithful | 12 | Old faithful Geiser data set | 272 | 2 | Source: http://research.microsoft.com/en-us/um/people/cmbishop/PRML/webdatasets/datasets.htm | ||
| fglass | 33 | Glass identification data set | Classification | 214 | 10 | Source: http://archive.ics.uci.edu/ml/datasets/Glass+Identification | |
| fisherIrisData | 2 | Iris species classification data | Classification | 150 | 4 | Source: Built in to the MATLAB statistics toolbox | |
| galaxies | 3 | Galaxy data frame records of the radial velocity of a spiral galaxy | 82 | 1 | Source: http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html | ||
| gaussClassifMissingData | 3 | ||||||
| harvard500 | 34 | Harvard website link data | Source: http://www.harvard.edu/ | ||||
| hastieMixture | 842 | Gaussian Mixture data used in Hastie's "Elements of Statistical Learning" figure 2.1-2.3 | Binary Classification | 200 | 2 | Source: http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/ | |
| heightWeight | 14 | Height vs weight data for two groups: men and women | Binary Classification | 200 | 2 | Source: http://www.stat.psu.edu/resources/bydata.htm | |
| housing | 208 | Boston housing data from UCI repository | Regression | 300 | 13 | Source: http://archive.ics.uci.edu/ml/support/Housing | |
| hyvarinenBookImages | 1453 | ||||||
| ionosphere | 233 | Classification of radar returns from the ionosphere | Classification | 351 | 34 | Source: http://archive.ics.uci.edu/ml/datasets/Ionosphere | |
| knnClassify2d | 15 | ||||||
| knnClassify3c | 38 | ||||||
| labTemperature | 6 | ||||||
| letterA | 1499 | Various images of the letter A | 4 | 128x128 | |||
| linRegSim1Ddata | 29 | ||||||
| lsi | 259 | ||||||
| markovLanguageClassif | 382 | ||||||
| marks | 3 | ||||||
| mnist1NNresults | 2 | ||||||
| mnistAll | 34763 | MNIST handwritten digits | 0-255 gray scale images | 0-9 class labels of digits | 60000 | 28x28 | Source: http://yann.lecun.com/exdb/mnist Created by: mnistMakeAll.m Contributed by: Yann Le Cun |
| morley | 3 | ||||||
| moteData | 5 | ||||||
| mpg | 56 | ||||||
| ngramData | 1435 | Unigram and Bigram data for Darwin's "On the origin of Species" | Source: Generated from this text | ||||
| oilFlow3Class | 371 | Oil Flow data | Classification | 1000 | 12 | Source: http://research.microsoft.com/en-us/um/people/cmbishop/PRML/webdatasets/datasets.htm | |
| olivettiFaces | 6810 | ||||||
| pimatr | 21 | Diabetes in Pima Indian Women | Binary Classification | 200 | 8 | Source: R built in dataset | |
| pmtkImages | 938 | ||||||
| prostate | 35 | Prostate cancer data set | Regression | 67 | 8 | Source: http://www-stat.stanford.edu/~tibs/ElemStatLearn/ | |
| rainfall | 5 | ||||||
| rip | 103 | ||||||
| rosetta | 22137 | ||||||
| sachsCtsHtf | 1307 | ||||||
| sachsDiscretized | 29 | ||||||
| sarcosData | 25334 | SARCOS robot arm regression data | Regression | 44484 | 21 | Source: http://www.gaussianprocess.org/gpml/data/ | |
| sat | 3 | SAT scores | Binary Classification | 30 | 1 | Source: Johnson and Albert p77 table 3.1 | |
| schmeeHahn | 2 | ||||||
| senateVoting | 119 | ||||||
| servo | 6 | Highly non-linear data generated from a servo system simulation | Regression | 167 | 4 | Source: http://archive.ics.uci.edu/ml/datasets/Servo | |
| sewellShah | 6 | ||||||
| sinusoidData | 277 | ||||||
| soy | 14 | UCI Soybean data set | Classification | 307 | 35 | Source: http://archive.ics.uci.edu/ml/datasets/Soybean+(Large) | |
| spamData | 1019 | ||||||
| speechDataDigits4And5 | 4041 | Speech signals of the numbers "four" and "five" | Binary Classification | 252 | Source: http://people.csail.mit.edu/tommi/ | ||
| srbct | 7216 | ||||||
| stackloss | 2 | ||||||
| tinyImages | 20 | A collection of tiny images all roughly 30x30 | Source: google images | ||||
| ugs4 | 2 | All 4-node undirected graphs | 64 | ||||
| ugs5 | 15 | All 5-node undirected graphs | 1024 | ||||
| uspsAll | 30952 | Handwritten digits | Classification | 1100x10 | 16x16 | Source: http://www.cs.toronto.edu/~roweis/data.html | |
| votes | 12 | ||||||
| vowelTrain | 52 | ||||||
| Ximg | 3 | 32x32 binary Image of letter X (0/1) | Binary | 1 | 1024 | Source: http://www.cs.ubc.ca/~schmidtm/Software/UGM | |
| XimgRgb | 4 | 32x32 color Image of letter X | Discrete | 1 | 32x32x3 | Source: http://www.cs.ubc.ca/~schmidtm/Software/UGM Created by: XimgRbgMake.m |
|
| XwindowsDocData | 145 | ||||||
| yeastCellCycle | 2818 | ||||||
| yeastData310 | 28 | ||||||
| yeastStress | 15682 | Gasch's gene expression data (contains NaNs) | Classification | 174 | 6152 | Source: http://genome-www.stanford.edu/yeast_stress/data.shtml Created by: parseYeastStressData.m |
|
| yeastUci | 110 | UCI Yeast data set: Predicting the Cellular Localization Sites of Proteins | Classification | 1484 | 8 | Source: http://archive.ics.uci.edu/ml/datasets/yeast |