Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Sunday, 30 September 2018

MVP Wall

At Microsoft Ignite 2018 Microsoft devoted an entire wall to list all the names of the MVPs. I felt very humbled to have my name on the MVP wall with so many amazing people. It is such a privilege to be a part of the Microsoft Data Community. #datafamily #MVPbuzz

hashtag



And there is my name.


Tuesday, 25 September 2018

SQLBits 2019

SQLBits 2019 has been announced. It is in the heart of Manchester. The last time it was in Manchester was in 2009. I am already excited about this next event. 




Monday, 24 September 2018

Azure SQL Database Managed Instance GA

At Microsoft Ignite it was announced that Azure SQL Database Managed Instance will be general availability on October 1, 2018.


Azure SQL Database Managed Instance is a deployment model of Azure SQL Database.  This service enables customers to migrate existing databases to a fully managed PaaS cloud environment.  It is possible to use the Data Migration Service (DMS) in Azure to lift and shift their on-premises SQL Server.  This can be a useful tool to use for secure  databases that reduces the management overhead. including automatic patching and version updates, automated backups and high-availability. 

Reading
Azure SQL Database Managed Instance, General Purpose tier general availability
Azure Database Migration Service and tool updates – Ignite 2018


SQL Server 2019, Big Data and AI


At Microsoft Ignite SQL Server 2019 was launched. An amazing product for the future combining SQL Server 2019 with big data and analytics. It is great to see the combining of multiple tools in once place, a one stop shop for large and small data, structured and unstructured and from multiple sources.

There are 3 major components to SQL Server 2019.




















The creation of a data virtualization layer that handles complexity of all data sources and format.  Enabling the integration of structured and unstructured data without moving the data.

The streamlining of data management with SQL Server 2019 big data clusters deployed in Kubernetes integrating HDFS and Spark. The architecture is explained in more depth here and looks like




The creation of a complete AI platform that can use Spark to analyse both structured and unstructured data anywhere, use SQL Server machine learning services and SparkML.




In summary SQL Server big data clusters allow you to deploy scalable clusters of SQL Server, Spark, and HDFS Docker containers running on Kubernetes.

Read More




Sunday, 23 September 2018

Microsoft Ignite - watch live


Microsoft Ignite is happening this week. Unfortunately I won't be there but the keynote and some deep dive sessions will be streamed live. I am looking forward to seeing what Microsoft CEO Satya Nadella shares as his vision for the future of tech. I will be interested to see what other tools and technologies will play an important part in the next year and be excited to see how data fits into this forthcoming vision. 

To watch the live stream, the meeting invite is for

Start Time: 09:00 - 17:15 (UK time 14:00)
Date: Monday 24 September 2018
Time Zone: Eastern Time (US and Canada) 

Friday, 14 September 2018

Azure Cosmos DB multi-model database

Azure Cosmos DB has to be one of my favorite databases due to the breadth of available database types, its choice of consistency models and elastic scale out.

An introduction can be read here.

A definition for each of these types of databases is given.








Key-value
A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data. Key-value pairs are frequently used in lookup tables, hash tables and configuration files.
https://searchenterprisedesktop.techtarget.com/definition/key-value-pair

Column
A column-oriented DBMS (or columnar database management system) is a database management system (DBMS) that stores data tables by column rather than by row.
https://en.wikipedia.org/wiki/Column-oriented_DBMS

Document
Document stores, also called document-oriented database systems, are characterized by their schema-free organization of data.That means records do not need to have a uniform structure, i.e. different records may have different columns. The types of the values ​​of individual columns can be different for each record. Columns can have more than one value (arrays). Records can have a nested structure. E.g. MongoDB
https://db-engines.com/en/article/Document+Stores

Graph
A graph database, also called a graph-oriented database, is a type of NoSQL database that uses graph theory to store, map and query relationships. Every node in a graph database is defined by a unique identifier, a set of outgoing edges and/or incoming edges and a set of properties expressed as key/value pairs.
https://whatis.techtarget.com/definition/graph-database


The five consistency levels offer predictable low latency guarantees and multiple well-defined relaxed consistency models.


Consistency Levels and guarantees

Consistency Level
Guarantees
Strong
Linearizability. Reads are guaranteed to return the most recent version of an item.
Bounded Staleness
Consistent Prefix. Reads lag behind writes by at most k prefixes or t interval
Session
Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads
Consistent Prefix
Updates returned are some prefix of all the updates, with no gaps
Eventual
Out of order reads



There is a useful capacity planer that looks at request units throughput per second, request unit consumption and the amount of data storage needed by your application.


Thursday, 13 September 2018

Hortonworks Data Analytics Studio and Open Hybrid Architecture

Hortonworks has announced the general availability of Hortonworks Data Analytics Studio (DAS). A new service to enable enhanced productivity of business analysts by delivering faster insights from data at scale. DAS is part of the Hortonworks DataPlane Service (DPS). DPS enables businesses to discover, manage, govern and now optimize their data spread across hybrid environments. DAS leverages open-source technologies such as Apache Hive to share and extend the value of a modern data architecture in heterogeneous environments. It includes a useful database heat map.




Hortonworks have also shared the Open Hybrid Architecture Initiative, designed to enable big data workloads to run in a hybrid manner across on-premises, multi-cloud and edge architectures.


The Open Hybrid Architecture initiative will

  • De-coupling storage, with both file system interfaces and an object-store interface to data.
  • Containerizing compute resources for elasticity and software isolation.
  • Sharing services for metadata, governance and security across all tiers.
  • Providing DevOps/orchestration tools for managing services/workloads via the “infrastructure is code” paradigm to allow spin-up/down in a programmatic manner.
  • Designating workloads specific to use cases such as EDW, data science, rather than sharing everything in a multi-tenant Hadoop cluster.