PMTK Data


Revision Date: 24-Oct-2010

Auto-generated by generatePmtkDataTable.m

Click on the file name to see more information, and click on the file size to download the zip file.


NAME FILESIZE (KB) DESCRIPTION X TYPE Y TYPE NCASES NDIMS SOURCE
04cars 106 2004 New Car and Truck Data mixed discrete 428 19 Source: Kiplinger's Personal Finance, December 2003, vol. 57, no. 12, pp. 104-123 www.kiplinger.com
14cancer 39383 Gene expression data for 14 types of cancer cts discrete 144 16063 Source: http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html
20news_w100 242 100 words from 20newsgroups data 4 meta-groups Sparse binary discrete (4) 16242 100 Source: http://cs.nyu.edu/~roweis/data.html and http://people.csail.mit.edu/jrennie/20Newsgroups/
20news_w1000 26290 Top 1000 words from the 20newsgroups data Sparse count data discrete (20) 11256 1000 Source: http://people.csail.mit.edu/jrennie/20Newsgroups/
Created by: Ben Marlin
adultCensus 4654 Adult census from UCI repository mixed binary 45222 11 Source: http://archive.ics.uci.edu/ml/datasets/Adult
alarmNetwork 16 The alarm monitoring system   Source: Beinlich, Ingo, H. J. Suermondt, R. M. Chavez, and G. F. Cooper (1989) "The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks" in Proc. of the Second European Conf. on Artificial Intelligence in Medicine (London, Aug.), 38, 247-256.
amlAll 2451 Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression discrete binary 38+34 7129 Source: http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
Created by: processAmlAllData.m
anscombe 4 Four very different data sets all with the same p-value discrete cts 11 1 Source: Built in R data set
autompg 28 Auto fuel consumption in miles per gallon, from UCI repository mixed real 392 8 Source: http://archive.ics.uci.edu/ml/datasets/Auto+MPG
Created by: autompgMakeData.m
bankruptcy 5 cts binary 66 2  
bayesFactorGeneData 13 A synthetic data set to test detection of differential expression between genes cts binary 100 2 Created by: bayesFactorMakeGeneData.m
binaryImages 42 Small binary images   7 150x150 Source: Random binarized images from the web
biscuits 428 cts cts  
bishop2class 13 Classification data set from Bishop's Pattern Recognition and Machine Learning cts binary 200 2 Source: http://research.microsoft.com/en-us/um/people/cmbishop/prml/webdatasets/datasets.htm
bodyBrainWeight 5 Regression   62 1  
car 18 Car evaluation database binary discrete 1728 6 Source: http://archive.ics.uci.edu/ml/datasets/Car+Evaluation
casinoData 4 Synthetic correlated dice rolls from an HMM model   Created by: casinoDemo.m
caterpillar 6 Caterpillar nests data cts discrete 33 10 Source: http://www.ceremade.dauphine.fr/~xian/BCS/caterpillar
colon 1141 Princeton genomics Alon et al colon cancer data cts binary 62 2000 Source: http://genomics-pubs.princeton.edu/oncology/affydata/index.html
Created by: processColonData.m
conjointAnalysisComputerBuyers 186 Computer survey data    
crabs 35 Leptograpsus crabs data Classification   80 5 Source: http://www.stats.ox.ac.uk/pub/PRNN/
cubicData 227 Regression   100 1  
dags4 7 All 4-node directed acyclic graphs   543 Created by: mk_all_dags.m
dags5 270 All 5-node directed acyclic graphs   29281 Created by: mk_all_dags.m
darwin 3320 The full text of Darwin's "On the origin of species"   20844 lines Source: http://www.gutenberg.org/etext/22764
diabetes 504    
digits3Htf 1280 Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service   658 16x16 Source: http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html
dna 2 DNA sequences   10 30  
emailData 2404 Email classification data Classification   1500 10000  
facesCBCL 3465  
facesOlivetti 46026 Images of faces Classification   400 64x64 Source: http://www.cs.toronto.edu/~roweis/data.html
facesYale 54474  
failureTime 2    
faithful 12 Old faithful Geiser data set   272 2 Source: http://research.microsoft.com/en-us/um/people/cmbishop/PRML/webdatasets/datasets.htm
fglass 33 Glass identification data set Classification   214 10 Source: http://archive.ics.uci.edu/ml/datasets/Glass+Identification
fisherIrisData 2 Iris species classification data Classification   150 4 Source: Built in to the MATLAB statistics toolbox
galaxies 3 Galaxy data frame records of the radial velocity of a spiral galaxy   82 1 Source: http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html
gaussClassifMissingData 3    
harvard500 34 Harvard website link data   Source: http://www.harvard.edu/
hastieMixture 842 Gaussian Mixture data used in Hastie's "Elements of Statistical Learning" figure 2.1-2.3 Binary Classification   200 2 Source: http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/
heightWeight 14 Height vs weight data for two groups: men and women Binary Classification   200 2 Source: http://www.stat.psu.edu/resources/bydata.htm
housing 208 Boston housing data from UCI repository Regression   300 13 Source: http://archive.ics.uci.edu/ml/support/Housing
hyvarinenBookImages 1453  
ionosphere 233 Classification of radar returns from the ionosphere Classification   351 34 Source: http://archive.ics.uci.edu/ml/datasets/Ionosphere
knnClassify2d 15    
knnClassify3c 38    
labTemperature 6    
letterA 1499 Various images of the letter A   4 128x128  
linRegSim1Ddata 29    
lsi 259    
markovLanguageClassif 382    
marks 3    
mnist1NNresults 2    
mnistAll 34763 MNIST handwritten digits 0-255 gray scale images 0-9 class labels of digits 60000 28x28 Source: http://yann.lecun.com/exdb/mnist
Created by: mnistMakeAll.m
Contributed by: Yann Le Cun
morley 3    
moteData 5    
mpg 56    
ngramData 1435 Unigram and Bigram data for Darwin's "On the origin of Species"   Source: Generated from this text
oilFlow3Class 371 Oil Flow data Classification   1000 12 Source: http://research.microsoft.com/en-us/um/people/cmbishop/PRML/webdatasets/datasets.htm
olivettiFaces 6810    
pimatr 21 Diabetes in Pima Indian Women Binary Classification   200 8 Source: R built in dataset
pmtkImages 938  
prostate 35 Prostate cancer data set Regression   67 8 Source: http://www-stat.stanford.edu/~tibs/ElemStatLearn/
rainfall 5    
rip 103    
rosetta 22137    
sachsCtsHtf 1307    
sachsDiscretized 29    
sarcosData 25334 SARCOS robot arm regression data Regression   44484 21 Source: http://www.gaussianprocess.org/gpml/data/
sat 3 SAT scores Binary Classification   30 1 Source: Johnson and Albert p77 table 3.1
schmeeHahn 2    
senateVoting 119    
servo 6 Highly non-linear data generated from a servo system simulation Regression   167 4 Source: http://archive.ics.uci.edu/ml/datasets/Servo
sewellShah 6    
sinusoidData 277    
soy 14 UCI Soybean data set Classification   307 35 Source: http://archive.ics.uci.edu/ml/datasets/Soybean+(Large)
spamData 1019    
speechDataDigits4And5 4041 Speech signals of the numbers "four" and "five" Binary Classification   252 Source: http://people.csail.mit.edu/tommi/
srbct 7216    
stackloss 2    
tinyImages 20 A collection of tiny images all roughly 30x30 Source: google images
ugs4 2 All 4-node undirected graphs   64  
ugs5 15 All 5-node undirected graphs   1024  
uspsAll 30952 Handwritten digits Classification   1100x10 16x16 Source: http://www.cs.toronto.edu/~roweis/data.html
votes 12    
vowelTrain 52    
Ximg 3 32x32 binary Image of letter X (0/1) Binary 1 1024 Source: http://www.cs.ubc.ca/~schmidtm/Software/UGM
XimgRgb 4 32x32 color Image of letter X Discrete 1 32x32x3 Source: http://www.cs.ubc.ca/~schmidtm/Software/UGM
Created by: XimgRbgMake.m
XwindowsDocData 145    
yeastCellCycle 2818    
yeastData310 28    
yeastStress 15682 Gasch's gene expression data (contains NaNs) Classification   174 6152 Source: http://genome-www.stanford.edu/yeast_stress/data.shtml
Created by: parseYeastStressData.m
yeastUci 110 UCI Yeast data set: Predicting the Cellular Localization Sites of Proteins Classification   1484 8 Source: http://archive.ics.uci.edu/ml/datasets/yeast