Excellent R tutorial

November 26th, 2011

http://faculty.washington.edu/tlumley/Rcourse/R-fundamentals.pdf

R Fundamentals and Programming Techniques

Thomas Lumley

2006

 

Unsupervised Learning With Random Forest Predictors

November 26th, 2011

http://www.genetics.ucla.edu/labs/horvath/RFclustering/RFclustering/RandomForestHorvath.pdf

Unsupervised Learning With Random Forest Predictors

Tao SHI and Steve HORVATH

A random forest (RF) predictor is an ensemble of individual tree predictors. As part  of their construction, RF predictors naturally lead to a dissimilarity measure between the  observations. One can also de?ne an RF dissimilarity measure between unlabeled data: the  idea is to construct an RF predictor that distinguishes the “observed” data from suitably  generated synthetic data. The observed data are the original unlabeled data and the synthetic  data are drawn from a reference distribution. Here we describe the properties of the RF  dissimilarity and make recommendations on how to use it in practice.  An RF dissimilarity can be attractive because it handles mixed variable types well,  is invariant to monotonic transformations of the input variables, and is robust to outlying  observations. The RF dissimilarity easily deals with a large number of variables due to its intrinsic variable selection; for example, the Addcl1 RF dissimilarity weighs the contribution  of each variable according to how dependent it is on other variables.  We ?nd that the RF dissimilarity is useful for detecting tumor sample clusters on the  basis of tumor marker expressions. In this application, biologically meaningful clusters can  often be described with simple thresholding rules.

Key Words: Biomarkers; Cluster analysis; Dissimilarity; Ensemble predictors; Tumor markers

www.genetics.ucla.edu/labs/horvath/RFclustering/RFclustering.htm

www.genetics.ucla.edu/labs/horvath/RFclustering/RFclustering/RFclusterTutorialTheory.PDF

www.genetics.ucla.edu/labs/horvath/RFclustering/RFclustering/FunctionsRFclustering.txt

 

 

Hello world!

February 10th, 2009

Short Bio:
I am a Senior Statistician/ Sr. SW Engineer at Sun Microsystems. Formerly I was an Analytic Science Manager and a Lead Scientist at Fair Isaac Corp, managed and executed data mining projects for Kraft Foods, Visa, Discover Financial, Cox Communications,….

My fields of expertise include statistical analysis and modeling, and data mining, mainly predictive analytics. I have a Ph.D in Theoretical and Mathematical Physics, and a Ph.D. in Economics.

My Websitehttp://zolot.us


Visit AnalyticBridge