Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Tuesday, 16 July 2013

Statistical Analysis for Data Science

To begin learning about Data Analysis and what tools are available understanding terminology is useful.

Data analysis is a body of methods that help to describe facts, detect patterns,
develop explanations, and test hypotheses. It is used in all of the sciences. It
is used in business, in administration, and in policy
(Levine, J.H.)

Some of the Data Analysis tools around:

Statistical Analysis with R and Microsoft SQL Server 2012

The R Project for Statistical Computing
R is a free software environment for statistical computing and graphics.
http://www.r-project.org/ . R Journal here: http://journal.r-project.org/current.html

RStudio IDE is a powerful and productive user interface for R. 

R tutorial
Introductory tutorials for R which simplify many statistical computations and can be a powerful tool.  http://www.cyclismo.org/tutorial/R/

10 R Packages Every Data Scientist Should Know About

    sqldf (for selecting from data frames using SQL)
    forecast (for easy forecasting of time series)
    plyr (data aggregation)
    stringr (string manipulation)
    Database connection packages RPostgreSQL, RMYSQL, RMongo, RODBC, RSQLite
    lubridate (time and date manipulation)
    ggplot2 (data visulization)
    qcc (statistical quality control and QC charts)
    reshape2 (data restructuring)
    randomForest (random forest predictive models)

MSBI Academy
Learn Microsoft's BI software with an expert, using a library of free instructional videos. Topics cover the full range of Microsoft BI technologies from Data Modeling to Dashboard Design. 

Monday, 1 July 2013

Database Landscape Map

The latest visualization of the number of  databases in the market place by 451 Group. This is an interesting graphic showing the vast number of database engines in the database space at the current time. The graphic can be found here