NAME  FILESIZE (KB)  DESCRIPTION  X TYPE  Y TYPE  NCASES  NDIMS  SOURCE 

04cars  106  2004 New Car and Truck Data  mixed  discrete  428  19  Source: Kiplinger's Personal Finance, December 2003, vol. 57, no. 12, pp. 104123 www.kiplinger.com 
14cancer  39383  Gene expression data for 14 types of cancer  cts  discrete  144  16063  Source: http://wwwstat.stanford.edu/~tibs/ElemStatLearn/data.html 
20news_w100  242  100 words from 20newsgroups data 4 metagroups  Sparse binary  discrete (4)  16242  100  Source: http://cs.nyu.edu/~roweis/data.html and http://people.csail.mit.edu/jrennie/20Newsgroups/ 
20news_w1000  26290  Top 1000 words from the 20newsgroups data  Sparse count data  discrete (20)  11256  1000  Source: http://people.csail.mit.edu/jrennie/20Newsgroups/ Created by: Ben Marlin 
adultCensus  4654  Adult census from UCI repository  mixed  binary  45222  11  Source: http://archive.ics.uci.edu/ml/datasets/Adult 
alarmNetwork  16  The alarm monitoring system  Source: Beinlich, Ingo, H. J. Suermondt, R. M. Chavez, and G. F. Cooper (1989) "The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks" in Proc. of the Second European Conf. on Artificial Intelligence in Medicine (London, Aug.), 38, 247256.  
amlAll  2451  Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression  discrete  binary  38+34  7129  Source: http://www.broadinstitute.org/cgibin/cancer/datasets.cgi Created by: processAmlAllData.m 
anscombe  4  Four very different data sets all with the same pvalue  discrete  cts  11  1  Source: Built in R data set 
autompg  28  Auto fuel consumption in miles per gallon, from UCI repository  mixed  real  392  8  Source: http://archive.ics.uci.edu/ml/datasets/Auto+MPG Created by: autompgMakeData.m 
bankruptcy  5  cts  binary  66  2  
bayesFactorGeneData  13  A synthetic data set to test detection of differential expression between genes  cts  binary  100  2  Created by: bayesFactorMakeGeneData.m 
binaryImages  42  Small binary images  7  150x150  Source: Random binarized images from the web  
biscuits  428  cts  cts  
bishop2class  13  Classification data set from Bishop's Pattern Recognition and Machine Learning  cts  binary  200  2  Source: http://research.microsoft.com/enus/um/people/cmbishop/prml/webdatasets/datasets.htm 
bodyBrainWeight  5  Regression  62  1  
car  18  Car evaluation database  binary  discrete  1728  6  Source: http://archive.ics.uci.edu/ml/datasets/Car+Evaluation 
casinoData  4  Synthetic correlated dice rolls from an HMM model  Created by: casinoDemo.m  
caterpillar  6  Caterpillar nests data  cts  discrete  33  10  Source: http://www.ceremade.dauphine.fr/~xian/BCS/caterpillar 
colon  1141  Princeton genomics Alon et al colon cancer data  cts  binary  62  2000  Source: http://genomicspubs.princeton.edu/oncology/affydata/index.html Created by: processColonData.m 
conjointAnalysisComputerBuyers  186  Computer survey data  
crabs  35  Leptograpsus crabs data  Classification  80  5  Source: http://www.stats.ox.ac.uk/pub/PRNN/  
cubicData  227  Regression  100  1  
dags4  7  All 4node directed acyclic graphs  543  Created by: mk_all_dags.m  
dags5  270  All 5node directed acyclic graphs  29281  Created by: mk_all_dags.m  
darwin  3320  The full text of Darwin's "On the origin of species"  20844 lines  Source: http://www.gutenberg.org/etext/22764  
diabetes  504  
digits3Htf  1280  Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service  658  16x16  Source: http://wwwstat.stanford.edu/~tibs/ElemStatLearn/data.html  
dna  2  DNA sequences  10  30  
emailData  2404  Email classification data  Classification  1500  10000  
facesCBCL  3465  
facesOlivetti  46026  Images of faces  Classification  400  64x64  Source: http://www.cs.toronto.edu/~roweis/data.html  
facesYale  54474  
failureTime  2  
faithful  12  Old faithful Geiser data set  272  2  Source: http://research.microsoft.com/enus/um/people/cmbishop/PRML/webdatasets/datasets.htm  
fglass  33  Glass identification data set  Classification  214  10  Source: http://archive.ics.uci.edu/ml/datasets/Glass+Identification  
fisherIrisData  2  Iris species classification data  Classification  150  4  Source: Built in to the MATLAB statistics toolbox  
galaxies  3  Galaxy data frame records of the radial velocity of a spiral galaxy  82  1  Source: http://wwwstat.stanford.edu/~tibs/ElemStatLearn/data.html  
gaussClassifMissingData  3  
harvard500  34  Harvard website link data  Source: http://www.harvard.edu/  
hastieMixture  842  Gaussian Mixture data used in Hastie's "Elements of Statistical Learning" figure 2.12.3  Binary Classification  200  2  Source: http://wwwstat.stanford.edu/~tibs/ElemStatLearn/datasets/  
heightWeight  14  Height vs weight data for two groups: men and women  Binary Classification  200  2  Source: http://www.stat.psu.edu/resources/bydata.htm  
housing  208  Boston housing data from UCI repository  Regression  300  13  Source: http://archive.ics.uci.edu/ml/support/Housing  
hyvarinenBookImages  1453  
ionosphere  233  Classification of radar returns from the ionosphere  Classification  351  34  Source: http://archive.ics.uci.edu/ml/datasets/Ionosphere  
knnClassify2d  15  
knnClassify3c  38  
labTemperature  6  
letterA  1499  Various images of the letter A  4  128x128  
linRegSim1Ddata  29  
lsi  259  
markovLanguageClassif  382  
marks  3  
mnist1NNresults  2  
mnistAll  34763  MNIST handwritten digits  0255 gray scale images  09 class labels of digits  60000  28x28  Source: http://yann.lecun.com/exdb/mnist Created by: mnistMakeAll.m Contributed by: Yann Le Cun 
morley  3  
moteData  5  
mpg  56  
ngramData  1435  Unigram and Bigram data for Darwin's "On the origin of Species"  Source: Generated from this text  
oilFlow3Class  371  Oil Flow data  Classification  1000  12  Source: http://research.microsoft.com/enus/um/people/cmbishop/PRML/webdatasets/datasets.htm  
olivettiFaces  6810  
pimatr  21  Diabetes in Pima Indian Women  Binary Classification  200  8  Source: R built in dataset  
pmtkImages  938  
prostate  35  Prostate cancer data set  Regression  67  8  Source: http://wwwstat.stanford.edu/~tibs/ElemStatLearn/  
rainfall  5  
rip  103  
rosetta  22137  
sachsCtsHtf  1307  
sachsDiscretized  29  
sarcosData  25334  SARCOS robot arm regression data  Regression  44484  21  Source: http://www.gaussianprocess.org/gpml/data/  
sat  3  SAT scores  Binary Classification  30  1  Source: Johnson and Albert p77 table 3.1  
schmeeHahn  2  
senateVoting  119  
servo  6  Highly nonlinear data generated from a servo system simulation  Regression  167  4  Source: http://archive.ics.uci.edu/ml/datasets/Servo  
sewellShah  6  
sinusoidData  277  
soy  14  UCI Soybean data set  Classification  307  35  Source: http://archive.ics.uci.edu/ml/datasets/Soybean+(Large)  
spamData  1019  
speechDataDigits4And5  4041  Speech signals of the numbers "four" and "five"  Binary Classification  252  Source: http://people.csail.mit.edu/tommi/  
srbct  7216  
stackloss  2  
tinyImages  20  A collection of tiny images all roughly 30x30  Source: google images  
ugs4  2  All 4node undirected graphs  64  
ugs5  15  All 5node undirected graphs  1024  
uspsAll  30952  Handwritten digits  Classification  1100x10  16x16  Source: http://www.cs.toronto.edu/~roweis/data.html  
votes  12  
vowelTrain  52  
Ximg  3  32x32 binary Image of letter X (0/1)  Binary  1  1024  Source: http://www.cs.ubc.ca/~schmidtm/Software/UGM  
XimgRgb  4  32x32 color Image of letter X  Discrete  1  32x32x3  Source: http://www.cs.ubc.ca/~schmidtm/Software/UGM Created by: XimgRbgMake.m 

XwindowsDocData  145  
yeastCellCycle  2818  
yeastData310  28  
yeastStress  15682  Gasch's gene expression data (contains NaNs)  Classification  174  6152  Source: http://genomewww.stanford.edu/yeast_stress/data.shtml Created by: parseYeastStressData.m 

yeastUci  110  UCI Yeast data set: Predicting the Cellular Localization Sites of Proteins  Classification  1484  8  Source: http://archive.ics.uci.edu/ml/datasets/yeast 