Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Friday, 5 December 2014

Data & the Quest for Understanding Complexity Part 3

Data and his extraordinary adventure forges ahead.

Data's pioneering story continues to bring insight to the data and database world.

Wednesday, 3 December 2014

Monday, 1 December 2014

Sunday, 23 November 2014

PASS SQL Saturday Business Analytics

Yesterday was the first PASS SQL Saturday Business Analytics event in London and Europe.  This event wouldn’t have been possible without the event organiser Jen Stirrup and a whole raft of other people.  It was an amazing privilege to help at the event  also along side my colleague Jason Linham for the Data Community. The purpose of the event for Business Data now and future: interacting with data to drive the business, at the velocity of the business.  This event supplements the core SQL Server community events such as SQLBits, PASS SQL Saturdays and Local User Groups. The event was for those people Business Analytics and Business Intelligence professionals who were interested in knowing more about analytical features in Excel, Big Data, Azure Machine Learning, Hadoop, R and SQL Server Business Intelligence.

The keynote was presented by Jonathan Woodward, Microsoft UK’s Business Lead for BI, Analytics, Big Data and Data Science on Data Culture : From BI & Analytics to Big Data and Data Science.

There was an introductory session on predictive analytics which covered the history of data mining, algorithm choice and the CRISP-DM methodology. The CRISP-DM (cross-industry process for data mining) methodology is a methodology for providing a robust structured approach to data mining.


A few useful links to find out more
There were various sessions on excel charts with a strong emphasis on never using the 3Dcharts  or cones as a choice for data visualization. (Edward Tufte, The Visual Display of Quantitative Information is a recommended read)

A few resources to look at
Peltier Tech Excel Charts and Programming Blog http://peltiertech.com/
chandoo.org  http://chandoo.org/wp/
www.excelcharts.com/ for dashboards

One of my favourite sessions was delivered by Mark Wilcock on Using R, Cubes And Data Visualisation To Answer “What If” Questions. The session provided some interested insight into how data exploration and data munging lead to data visualization and drew on a combined tool set. It was very interesting to see the benefits and disadvantages of each area R, Excel, SQL and Cubes across the Load, Model and Visualization stack. He recommended watching a presentation delivered by Prof. Mark Whitehorn, School of Computing, University of Dundee on the Monte Carlo scenario.

Microsoft Finance shared how they used the full set of BI tools to deliver finance dashboard and drilldown reports.

The day concluded with an excellent session from Chris Webb on the usage scenarios for Power Query and the use of the M language. 

This was a fun and informative day which I hope will be repeated next year.   

Wednesday, 12 November 2014

SQL Server 2014 In Memory Technology Benefits

I came across this video that mentions five core design points for SQL Server in-memory:

  • Built-in
  • Increases Speed and Throughput
  • Flexible
  • Easy to Implement
  • Workload-Optimized
    More details can be found here 

    Saturday, 8 November 2014

    MongoDB Days London 2014

    I attended the MongoDB event in London on 6 November. This was the first NoSQL event I have attended.

    MongoDB  (from "humongous") is an open-source document database, agile, scalable and for general purpose data.  The schema is dynamic and the data model can evolve as the application evolves.  There are 3 core design principals to MongoDB

    • Increasing development productivity
    •  Ensuring it is easy to maintain
    •  Horizontal scalability

    New features in version 2.8 include document level locking and pluggable storage engines. The WiredTiger (Non-locking algorithms, access data at RAM speed) storage engine is available in MongoDB.

    There are two base architecture models

    • Replica Sets (for High Availability and Disaster Recovery)
    • Sharding (increasing the volume of persisted data too large for host machines)

    The MongoDB Management Service (MMS) is a hosted service that provides monitoring, backup, and automated deployment of MongoDB instances.  This tool will soon be available on premises as well. Currently scripts for automating management can be deployed using Chef  and Puppet etc.  These can be difficult to maintain. The new automation component of MMS makes deployment and elastic scale easy to manage.  

    Backups can also be done by using the mongodump utility however, if you need to restore the data you need to rebuild the indexes after restore.

    Security in MongoDB has databases roles, can use certificates and encryption. MongoDB comes initially with no permissions set so you can do everything. It is important to set permissions following the principle of least priveledge. $redact is a new  aggregation framework operator to protect data in the database from viewing.

    MongoDB can be used for analytics and has a business data connector to Hadoop.

    Tools for Troubleshooting
    % mongostat - Provides a quick overview of the status of a currently running mongod or mongos instance.
    % mongotop  - Shows the amount of time a MongoDB instance spends reading and writing data.
    Db.currentOp () -Returns information on in-progress operations for the database instance
    Db.serverStatus() - Provides an overview of the database process's state
    Rs.status() - Reflects the current status of the replica set
    Sh.status() - Reports on the sharding configuration and the information regarding existing chunks in a sharded cluster

    The log explained  

    M Tools scripts help visualise the MongoDB log files. The commands used  from this tool in the troubleshooting session were

    Terms mentioned during the day and their definitions

    Oplog - stores an ordered history of logical writes to a MongoDB database.
    Config servers -  are special mongod instances that store the metadata for a sharded cluster.  
    Mongod - The MongoDB database server.
    Mongos - The routing and load balancing process that acts an interface between an application and a MongoDB sharded cluster.

    The event provided a useful introduction to MongoDB.