Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Sunday 23 November 2014

PASS SQL Saturday Business Analytics

Yesterday was the first PASS SQL Saturday Business Analytics event in London and Europe.  This event wouldn’t have been possible without the event organiser Jen Stirrup and a whole raft of other people.  It was an amazing privilege to help at the event  also along side my colleague Jason Linham for the Data Community. The purpose of the event for Business Data now and future: interacting with data to drive the business, at the velocity of the business.  This event supplements the core SQL Server community events such as SQLBits, PASS SQL Saturdays and Local User Groups. The event was for those people Business Analytics and Business Intelligence professionals who were interested in knowing more about analytical features in Excel, Big Data, Azure Machine Learning, Hadoop, R and SQL Server Business Intelligence.

The keynote was presented by Jonathan Woodward, Microsoft UK’s Business Lead for BI, Analytics, Big Data and Data Science on Data Culture : From BI & Analytics to Big Data and Data Science.

There was an introductory session on predictive analytics which covered the history of data mining, algorithm choice and the CRISP-DM methodology. The CRISP-DM (cross-industry process for data mining) methodology is a methodology for providing a robust structured approach to data mining.


A few useful links to find out more
There were various sessions on excel charts with a strong emphasis on never using the 3Dcharts  or cones as a choice for data visualization. (Edward Tufte, The Visual Display of Quantitative Information is a recommended read)

A few resources to look at
Peltier Tech Excel Charts and Programming Blog http://peltiertech.com/
chandoo.org  http://chandoo.org/wp/
www.excelcharts.com/ for dashboards

One of my favourite sessions was delivered by Mark Wilcock on Using R, Cubes And Data Visualisation To Answer “What If” Questions. The session provided some interested insight into how data exploration and data munging lead to data visualization and drew on a combined tool set. It was very interesting to see the benefits and disadvantages of each area R, Excel, SQL and Cubes across the Load, Model and Visualization stack. He recommended watching a presentation delivered by Prof. Mark Whitehorn, School of Computing, University of Dundee on the Monte Carlo scenario.

Microsoft Finance shared how they used the full set of BI tools to deliver finance dashboard and drilldown reports.

The day concluded with an excellent session from Chris Webb on the usage scenarios for Power Query and the use of the M language. 

This was a fun and informative day which I hope will be repeated next year.   

Wednesday 12 November 2014

SQL Server 2014 In Memory Technology Benefits

I came across this video that mentions five core design points for SQL Server in-memory:

  • Built-in
  • Increases Speed and Throughput
  • Flexible
  • Easy to Implement
  • Workload-Optimized
    More details can be found here 

    Saturday 8 November 2014

    MongoDB Days London 2014

    I attended the MongoDB event in London on 6 November. This was the first NoSQL event I have attended.

    MongoDB  (from "humongous") is an open-source document database, agile, scalable and for general purpose data.  The schema is dynamic and the data model can evolve as the application evolves.  There are 3 core design principals to MongoDB

    • Increasing development productivity
    •  Ensuring it is easy to maintain
    •  Horizontal scalability

    New features in version 2.8 include document level locking and pluggable storage engines. The WiredTiger (Non-locking algorithms, access data at RAM speed) storage engine is available in MongoDB.

    There are two base architecture models

    • Replica Sets (for High Availability and Disaster Recovery)
    • Sharding (increasing the volume of persisted data too large for host machines)

    The MongoDB Management Service (MMS) is a hosted service that provides monitoring, backup, and automated deployment of MongoDB instances.  This tool will soon be available on premises as well. Currently scripts for automating management can be deployed using Chef  and Puppet etc.  These can be difficult to maintain. The new automation component of MMS makes deployment and elastic scale easy to manage.  

    Backups can also be done by using the mongodump utility however, if you need to restore the data you need to rebuild the indexes after restore.

    Security in MongoDB has databases roles, can use certificates and encryption. MongoDB comes initially with no permissions set so you can do everything. It is important to set permissions following the principle of least priveledge. $redact is a new  aggregation framework operator to protect data in the database from viewing.

    MongoDB can be used for analytics and has a business data connector to Hadoop.

    Tools for Troubleshooting
    % mongostat - Provides a quick overview of the status of a currently running mongod or mongos instance.
    % mongotop  - Shows the amount of time a MongoDB instance spends reading and writing data.
    Db.currentOp () -Returns information on in-progress operations for the database instance
    Db.serverStatus() - Provides an overview of the database process's state
    Rs.status() - Reflects the current status of the replica set
    Sh.status() - Reports on the sharding configuration and the information regarding existing chunks in a sharded cluster

    The log explained  

    M Tools scripts help visualise the MongoDB log files. The commands used  from this tool in the troubleshooting session were

    Terms mentioned during the day and their definitions

    Oplog - stores an ordered history of logical writes to a MongoDB database.
    Config servers -  are special mongod instances that store the metadata for a sharded cluster.  
    Mongod - The MongoDB database server.
    Mongos - The routing and load balancing process that acts an interface between an application and a MongoDB sharded cluster.

    The event provided a useful introduction to MongoDB.

    Wednesday 5 November 2014

    2014 PASS Summit Day 1 Keynote: Evolving Microsoft’s Data Platform – The Journey to Cloud Continues

    The Keynote Speakers were T.K. "Ranga" Rengarajan, James Phillips and Joseph Sirosh.

    Data is changing the way we work and live. There will be 40% more data year after year. We can lead this change of data culture where we harness more data types, tackle big data and deliver solutions faster.  The Microsoft Data Platform is a no compromise platform where there is no side of technology which is best. Everything just works together from scale up to scale out and from unstructured to structured data. It is a solution that scales with needs with no restrictions.

    The aims to capture diverse data, achieve elastic scale, maximize performance & availability and simplify with cloud. Microsoft want no boundaries, be able to maximise the customer experience and make it easy to run. 

    A recap on services

    • Capture diverse data with Azure DocumentDB (NoSQL DB Service which is schema free), Azure HDInsight (100% Apache Hadoop service with HBase and Storm), Analytics Platform System (Polybase SQL and Hadoop appliance) and Azure Search (Fully maanged search service for moblie apps).
    • Achieve elastic scale with 3 options SQL Server 2014 with Windows Server 2012R2, SQL Server in Azure VMs  and Azure SQL Database. Combines simplicity and power in scale out with the best tool for the job. We are at the beginning of amazing possibilities
    • Maximise performance and availability with Azure SQL Database and SQL Server 2014.
    • Simplify with Cloud using Hybrid scenarios with SQL Server 2014 and Azure as a data lake to store big data in its natural form. SQL Server benefits from the cloud as well and one engine can handle all workloads. Some assets however will always be on premises
    We need to think of data differently.  There will be an upcoming preview of Azure SQL Database which will include major changes to the platform.

    • TSQL compatibility
    • Larger index handling
    • Parallel queries
    • Extended events
    • In-memory ColumnStore for data marts

    All technologies are converging to make one solution.

    Taken from:  http://bit.ly/1qq05bo

    There is a need to understand the past, analyse the present and predict what’s next. The tools available are Azure Data Factory, Azure Stream Analytics and Azure Machine learning. Microsoft announced Free Machine Learning azure.com/ml

    Azure Data factory is like SSIS for orchestrating across the cloud without installing anything and is fully managed. Azure Stream Analytics is for data in motion and Machine learning for driving predications.

    The last section of the keynote covered Power Pivot and Power Query to allow you to get the data easier, model, cleanse and then transform it for upstream use.  Power BI SaaS offer simplifies deployment and reduce time to solution. Q& A natural language query in PowerBI allows users to ask questions of the data by typing.

    Live operational dashboards will be available in Power BI to drive business insight.