Tuesday, 16 July 2013

Statistical Analysis for Data Science

To begin learning about Data Analysis and what tools are available understanding terminology is useful.

Data analysis is a body of methods that help to describe facts, detect patterns,
develop explanations, and test hypotheses. It is used in all of the sciences. It
is used in business, in administration, and in policy
(Levine, J.H.)

Some of the Data Analysis tools around:

Statistical Analysis with R and Microsoft SQL Server 2012
http://blog.sqltrainer.com/2011/12/statistical-analysis-with-r-and.html

The R Project for Statistical Computing
R is a free software environment for statistical computing and graphics.
http://www.r-project.org/ . R Journal here: http://journal.r-project.org/current.html

RStudio
RStudio IDE is a powerful and productive user interface for R. 
http://www.rstudio.com/

R tutorial
Introductory tutorials for R which simplify many statistical computations and can be a powerful tool.  http://www.cyclismo.org/tutorial/R/

10 R Packages Every Data Scientist Should Know About
http://blog.yhathq.com/posts/10-R-packages-I-wish-I-knew-about-earlier.html

    sqldf (for selecting from data frames using SQL)
    forecast (for easy forecasting of time series)
    plyr (data aggregation)
    stringr (string manipulation)
    Database connection packages RPostgreSQL, RMYSQL, RMongo, RODBC, RSQLite
    lubridate (time and date manipulation)
    ggplot2 (data visulization)
    qcc (statistical quality control and QC charts)
    reshape2 (data restructuring)
    randomForest (random forest predictive models)

MSBI Academy
http://msbiacademy.com/
Learn Microsoft's BI software with an expert, using a library of free instructional videos. Topics cover the full range of Microsoft BI technologies from Data Modeling to Dashboard Design. 

No comments:

Post a Comment

Note: only a member of this blog may post a comment.