Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Friday 19 December 2014

Database Research

My first research paper, from my doctoral research, has been published and available online 18 December 2014. It is currently an article in press, uncorrected proof. Articles in Press do not have all bibliographic details available yet. The full type set article will be available soon.

The paper is entitled The Usage of Best Practices and Procedures in the Database Community published on ScienceDirect as part of the Journal Information Systems Read More

There is an audio slides presentation introducing the paper.

V. Holt, et al., The usage of best practices and procedures in the database
community, Information Systems (2014),http://dx.doi.org/10.1016/j.is.2014.12.004

The Information Systems journal aims are

Databases: Their Creation, Management and Utilization

Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems.

Subject areas include data management issues as presented in the principal international database conferences (e.g. ACM SIGMOD, ACM PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining, information retrieval, internet and cloud data management, web semantics, visual and audio information systems, scientific computing, and organisational behaviour.

Friday 5 December 2014

Data & the Quest for Understanding Complexity Part 3


Data and his extraordinary adventure forges ahead.


















Data's pioneering story continues to bring insight to the data and database world.

Wednesday 3 December 2014

Monday 1 December 2014

Sunday 23 November 2014

PASS SQL Saturday Business Analytics



Yesterday was the first PASS SQL Saturday Business Analytics event in London and Europe.  This event wouldn’t have been possible without the event organiser Jen Stirrup and a whole raft of other people.  It was an amazing privilege to help at the event  also along side my colleague Jason Linham for the Data Community. The purpose of the event for Business Data now and future: interacting with data to drive the business, at the velocity of the business.  This event supplements the core SQL Server community events such as SQLBits, PASS SQL Saturdays and Local User Groups. The event was for those people Business Analytics and Business Intelligence professionals who were interested in knowing more about analytical features in Excel, Big Data, Azure Machine Learning, Hadoop, R and SQL Server Business Intelligence.

The keynote was presented by Jonathan Woodward, Microsoft UK’s Business Lead for BI, Analytics, Big Data and Data Science on Data Culture : From BI & Analytics to Big Data and Data Science.

There was an introductory session on predictive analytics which covered the history of data mining, algorithm choice and the CRISP-DM methodology. The CRISP-DM (cross-industry process for data mining) methodology is a methodology for providing a robust structured approach to data mining.
 



  















A few useful links to find out more
There were various sessions on excel charts with a strong emphasis on never using the 3Dcharts  or cones as a choice for data visualization. (Edward Tufte, The Visual Display of Quantitative Information is a recommended read)

A few resources to look at
Peltier Tech Excel Charts and Programming Blog http://peltiertech.com/
chandoo.org  http://chandoo.org/wp/
www.excelcharts.com/ for dashboards

One of my favourite sessions was delivered by Mark Wilcock on Using R, Cubes And Data Visualisation To Answer “What If” Questions. The session provided some interested insight into how data exploration and data munging lead to data visualization and drew on a combined tool set. It was very interesting to see the benefits and disadvantages of each area R, Excel, SQL and Cubes across the Load, Model and Visualization stack. He recommended watching a presentation delivered by Prof. Mark Whitehorn, School of Computing, University of Dundee on the Monte Carlo scenario.

Microsoft Finance shared how they used the full set of BI tools to deliver finance dashboard and drilldown reports.

The day concluded with an excellent session from Chris Webb on the usage scenarios for Power Query and the use of the M language. 

This was a fun and informative day which I hope will be repeated next year.   

Wednesday 12 November 2014

SQL Server 2014 In Memory Technology Benefits

I came across this video that mentions five core design points for SQL Server in-memory:

  • Built-in
  • Increases Speed and Throughput
  • Flexible
  • Easy to Implement
  • Workload-Optimized
    More details can be found here 


    Saturday 8 November 2014

    MongoDB Days London 2014








    I attended the MongoDB event in London on 6 November. This was the first NoSQL event I have attended.

    MongoDB  (from "humongous") is an open-source document database, agile, scalable and for general purpose data.  The schema is dynamic and the data model can evolve as the application evolves.  There are 3 core design principals to MongoDB

    • Increasing development productivity
    •  Ensuring it is easy to maintain
    •  Horizontal scalability

    New features in version 2.8 include document level locking and pluggable storage engines. The WiredTiger (Non-locking algorithms, access data at RAM speed) storage engine is available in MongoDB.

    There are two base architecture models

    • Replica Sets (for High Availability and Disaster Recovery)
    • Sharding (increasing the volume of persisted data too large for host machines)

    The MongoDB Management Service (MMS) is a hosted service that provides monitoring, backup, and automated deployment of MongoDB instances.  This tool will soon be available on premises as well. Currently scripts for automating management can be deployed using Chef  and Puppet etc.  These can be difficult to maintain. The new automation component of MMS makes deployment and elastic scale easy to manage.  

    Backups can also be done by using the mongodump utility however, if you need to restore the data you need to rebuild the indexes after restore.

    Security in MongoDB has databases roles, can use certificates and encryption. MongoDB comes initially with no permissions set so you can do everything. It is important to set permissions following the principle of least priveledge. $redact is a new  aggregation framework operator to protect data in the database from viewing.

    MongoDB can be used for analytics and has a business data connector to Hadoop.

    Tools for Troubleshooting
    % mongostat - Provides a quick overview of the status of a currently running mongod or mongos instance.
    % mongotop  - Shows the amount of time a MongoDB instance spends reading and writing data.
    Db.currentOp () -Returns information on in-progress operations for the database instance
    Db.serverStatus() - Provides an overview of the database process's state
    Rs.status() - Reflects the current status of the replica set
    Sh.status() - Reports on the sharding configuration and the information regarding existing chunks in a sharded cluster

    The log explained  


    M Tools scripts help visualise the MongoDB log files. The commands used  from this tool in the troubleshooting session were
    Mloginfo
    Mlogfilter
    Mplotqueries

    Definitions
    Terms mentioned during the day and their definitions

    Oplog - stores an ordered history of logical writes to a MongoDB database.
    Config servers -  are special mongod instances that store the metadata for a sharded cluster.  
    Mongod - The MongoDB database server.
    Mongos - The routing and load balancing process that acts an interface between an application and a MongoDB sharded cluster.

    The event provided a useful introduction to MongoDB.