And there is my name.
Chaos, complexity, curiosity and database systems. A place where research meets industry
Welcome
Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP
"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein
"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein
Sunday, 30 September 2018
MVP Wall
At Microsoft Ignite 2018 Microsoft devoted an entire wall to list all the names of the MVPs. I felt very humbled to have my name on the MVP wall with so many amazing people. It is such a privilege to be a part of the Microsoft Data Community. #datafamily #MVPbuzz
And there is my name.
And there is my name.
Tuesday, 25 September 2018
SQLBits 2019
SQLBits 2019 has been announced. It is in the heart of Manchester. The last time it was in Manchester was in 2009. I am already excited about this next event.
Monday, 24 September 2018
Azure SQL Database Managed Instance GA
At Microsoft Ignite it was announced that Azure SQL Database Managed Instance will be general availability on October 1, 2018.
Azure
SQL Database Managed Instance is a deployment model of Azure SQL
Database. This service enables customers to migrate existing databases to
a fully managed PaaS cloud environment. It is possible to use the Data
Migration Service (DMS) in Azure to lift and shift their on-premises SQL Server. This
can be a useful tool to use for secure databases that reduces the
management overhead. including automatic patching and version updates, automated
backups and high-availability.
Reading
Azure SQL Database Managed Instance, General Purpose tier general availability
Azure Database Migration Service and tool updates – Ignite 2018
Reading
Azure SQL Database Managed Instance, General Purpose tier general availability
Azure Database Migration Service and tool updates – Ignite 2018
SQL Server 2019, Big Data and AI
At Microsoft Ignite SQL Server 2019 was launched. An amazing product for the future combining SQL Server 2019 with big data and analytics. It is great to see the combining of multiple tools in once place, a one stop shop for large and small data, structured and unstructured and from multiple sources.
There are 3 major components
to SQL Server 2019.
The creation of a data
virtualization layer that handles complexity of all data sources and format. Enabling the integration of structured and unstructured
data without moving the data.
The streamlining of data management
with SQL Server 2019 big data clusters deployed in Kubernetes integrating HDFS
and Spark. The architecture is explained in more depth here and looks like
The creation of a complete AI
platform that can use Spark to analyse both structured and unstructured data
anywhere, use SQL Server machine learning services and SparkML.
In summary SQL Server big data clusters allow you to deploy scalable clusters of SQL Server, Spark, and HDFS Docker containers running on Kubernetes.
Read More
Sunday, 23 September 2018
Microsoft Ignite - watch live
Microsoft Ignite is happening this week. Unfortunately I won't be there but the keynote and some deep dive sessions will be streamed live. I am looking forward to seeing what Microsoft CEO Satya Nadella shares as his vision for the future of tech. I will be interested to see what other tools and technologies will play an important part in the next year and be excited to see how data fits into this forthcoming vision.
To watch the live stream, the meeting invite is for
Start Time: 09:00 - 17:15 (UK time 14:00)
Date: Monday 24 September 2018
Time Zone: Eastern Time (US and Canada)
Friday, 14 September 2018
Azure Cosmos DB multi-model database
Azure Cosmos DB has to be one of my favorite databases due to the breadth of available database types, its choice of consistency models and elastic scale out.
An introduction can be read here.
A definition for each of these types of databases is given.
Key-value
A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data. Key-value pairs are frequently used in lookup tables, hash tables and configuration files.
https://searchenterprisedesktop.techtarget.com/definition/key-value-pair
Column
A column-oriented DBMS (or columnar database management system) is a database management system (DBMS) that stores data tables by column rather than by row.
https://en.wikipedia.org/wiki/Column-oriented_DBMS
Document
Document stores, also called document-oriented database systems, are characterized by their schema-free organization of data.That means records do not need to have a uniform structure, i.e. different records may have different columns. The types of the values of individual columns can be different for each record. Columns can have more than one value (arrays). Records can have a nested structure. E.g. MongoDB
https://db-engines.com/en/article/Document+Stores
Graph
A graph database, also called a graph-oriented database, is a type of NoSQL database that uses graph theory to store, map and query relationships. Every node in a graph database is defined by a unique identifier, a set of outgoing edges and/or incoming edges and a set of properties expressed as key/value pairs.
https://whatis.techtarget.com/definition/graph-database
There is a useful capacity planer that looks at request units throughput per second, request unit consumption and the amount of data storage needed by your application.
A definition for each of these types of databases is given.
Key-value
A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data. Key-value pairs are frequently used in lookup tables, hash tables and configuration files.
https://searchenterprisedesktop.techtarget.com/definition/key-value-pair
Column
A column-oriented DBMS (or columnar database management system) is a database management system (DBMS) that stores data tables by column rather than by row.
https://en.wikipedia.org/wiki/Column-oriented_DBMS
Document
Document stores, also called document-oriented database systems, are characterized by their schema-free organization of data.That means records do not need to have a uniform structure, i.e. different records may have different columns. The types of the values of individual columns can be different for each record. Columns can have more than one value (arrays). Records can have a nested structure. E.g. MongoDB
https://db-engines.com/en/article/Document+Stores
Graph
A graph database, also called a graph-oriented database, is a type of NoSQL database that uses graph theory to store, map and query relationships. Every node in a graph database is defined by a unique identifier, a set of outgoing edges and/or incoming edges and a set of properties expressed as key/value pairs.
https://whatis.techtarget.com/definition/graph-database
The five consistency
levels offer predictable low latency
guarantees and multiple well-defined relaxed consistency models.
Consistency Levels and guarantees
Consistency Level
|
Guarantees
|
Strong
|
Linearizability.
Reads are guaranteed to return the most recent version of an item.
|
Bounded
Staleness
|
Consistent
Prefix. Reads lag behind writes by at most k prefixes or t interval
|
Session
|
Consistent
Prefix. Monotonic reads, monotonic writes, read-your-writes,
write-follows-reads
|
Consistent
Prefix
|
Updates
returned are some prefix of all the updates, with no gaps
|
Eventual
|
Out of
order reads
|
There is a useful capacity planer that looks at request units throughput per second, request unit consumption and the amount of data storage needed by your application.
Thursday, 13 September 2018
Hortonworks Data Analytics Studio and Open Hybrid Architecture
Hortonworks has announced the general availability of Hortonworks Data Analytics Studio (DAS). A new service to enable enhanced productivity of business analysts by delivering faster insights from data at scale. DAS is part of the Hortonworks DataPlane Service (DPS). DPS enables businesses to discover, manage, govern and now optimize their data spread across hybrid environments. DAS leverages open-source technologies such as Apache Hive to share and extend the value of a modern data architecture in heterogeneous environments. It includes a useful database heat map.
Hortonworks have also shared the Open Hybrid Architecture Initiative, designed to enable big data workloads to run in a hybrid manner across on-premises, multi-cloud and edge architectures.
The Open Hybrid Architecture initiative will
Hortonworks have also shared the Open Hybrid Architecture Initiative, designed to enable big data workloads to run in a hybrid manner across on-premises, multi-cloud and edge architectures.
The Open Hybrid Architecture initiative will
- De-coupling storage, with both file system interfaces and an object-store interface to data.
- Containerizing compute resources for elasticity and software isolation.
- Sharing services for metadata, governance and security across all tiers.
- Providing DevOps/orchestration tools for managing services/workloads via the “infrastructure is code” paradigm to allow spin-up/down in a programmatic manner.
- Designating workloads specific to use cases such as EDW, data science, rather than sharing everything in a multi-tenant Hadoop cluster.
AI the art of the possible
During SQL Saturday Cambridge I attended a session by Terry McCann on using AI to write a session submission to SQL Saturday. It was a great session and I would recommend you attending it if you get a chance.
He mentioned also the book Harry Potter and the Portrait of what Looked Like a Large Pile of Ash which was written by an AI bot. I hadn't come across this before but it lets you see the art of the possible in the future.
In the new data world it is important to understand the difference between AI, Machine Learning and Deep Learning.
Image : https://www.argility.com/argility-ecosystem-solutions/industry-4-0/machine-learning-deep-learning/
Then breaking this down further the differences between how machine learning works and deep learning is shown here. Deep learning is really a black box.
Image : https://www.upwork.com/hiring/for-clients/log-analytics-deep-learning-machine-learning/
Terry mentioned a free book to read to learn more Neural Networks and Deep
Learning.
He mentioned also the book Harry Potter and the Portrait of what Looked Like a Large Pile of Ash which was written by an AI bot. I hadn't come across this before but it lets you see the art of the possible in the future.
Image: https://imgur.com/gallery/gkLFz
Wednesday, 12 September 2018
Azure Data Factory Data Flows
There
is a preview for a new feature called 'Data Flows' that enables visual
data transformation capability within Azure Data Factory (ADF). This is scaled
out using Databricks. This is available in the ADF Resource Explorer. Data Flow should
enable a deeper understanding of your data. Monitoring and inspecting pipelines
can be done in ADF. The transformations occur inside Azure Databricks which
uses the Spark engine to distribute data across partitions.
References
Microsoft Azure Data Factory documentation for Data Flow
during preview is available here: https://aka.ms/adfdataflowdocs
Tuesday, 11 September 2018
Azure DevOps Services
Microsoft have announced new Azure
DevOps services than span the breadth of the development lifecycle to help
developers ship software faster and with higher quality. Azure DevOps includes:
CI/CD that works with any language,
platform, and cloud. Connect to GitHub or any Git repository and deploy
continuously.
Powerful work tracking with Kanban
boards, backlogs, team dashboards, and custom reporting.
Maven, npm, and NuGet package feeds
from public and private sources.
Unlimited cloud-hosted private Git
repos for your project. Collaborative pull requests, advanced file management,
and more.
All in one planned and exploratory
testing solution.
PowerBI Report Server
Power
BI Report Server is similar to SQL Server Reporting Services (SSRS). Power BI Report
Server is on premises, and hosts paginated reports in addition to the Power BI features. There are several
types of reports
- Paginated (standard SSRS type reports)
- Interactive (PowerBI Desktop)
- Mobile
- Analytical (Excel)
PowerBI Pro enables sharing and collaboration with the following features
- Build dashboards that deliver a 360-degree, real-time view of the business
- Keep data up-to-date automatically, including on-premises sources
- Collaborate on shared data
- Audit and govern how data is accessed and used
- Package content and distribute to users with app
There is increased functionality within PowerBI
Pro. A comparison between the free PowerBI desktop edition and Pro version is in the table below. Only Pro users
can publish content to app workspaces, consume apps without Premium capacity,
share dashboards and subscribe to dashboards and reports. Free users can
connect to all data sources using connectivity options such as DirectQuery,
live connection and use the data gateway.
Licensing Power BI
Report Server
Power BI Report Server is available in two different license
modes:
- Power BI Premium
- SQL Server Enterprise
Edition with
Software Assurance.
A Power BI Premium license enables creation of a hybrid deployment
(cloud and on-premises).
Capacity
The session delivered
by Isabelle Van Campenhoudt at SQL Saturday Cambridge shared this useful capacity planning paper. It is based on the sample Power BI Report Server topology
using virtual machines.
Monday, 10 September 2018
Changing our concept of the world
In Ackoff’s
Best His Classic Writings on Management (1999), he shares a way of thinking
that as fascinated me for years. It can be used for problem solving and has
helped me manage database systems for years. It is about looking at things in a
holistic way. This way of thinking, synthesis,
often yields contradictory or conflicting results to traditional analytical
thinking. Systems thinking, as it is known, helps expand your viewpoint
considering the holistic nature of the environment. Ackoff shared the
difference between the two modes of thinking:
Machine Age
Thinking (Analysis)
|
Synthesis (Systems
Thinking)
|
Analysis precedes synthesis
|
Synthesis precedes
analysis
|
Focuses on
structure
|
Focuses on function
|
Reveals how things
work
|
Reveals why things
operate as they do
|
Yields knowledge
|
Yields understanding
|
Enables us to
describe
|
Enables us to
explain
|
Looks into things
|
Looks out of things
|
Iteration of the
parts of the thing to be explained
|
Iteration of the
parts of the thing to be explained, interactions between things in its
environment and with its environment itself concerned with functional
interaction of the parts of system
|
Break down large
complex problems into solvable or manageable parts outputs assembled into a
solution of the whole
|
Some of the best
solution obtained from the parts taken separately is not the best solution of
the whole.
|
The best performance
of the whole can be reduced to the sum of the best performance of its parts
taken separately
|
Asserts this is not
possible
|
Reduces the focus
of the investigator
|
Expands it focus of
the investigator
|
As a result, I
never look only at the immediate problem but consider all of the
interconnecting parts.
Sunday, 9 September 2018
SQLSaturday Cambridge 2018 - Britannia Edition!
I attended PASS SQL Saturday Cambridge 8 September 2018. This was held at The Møller Centre, Churchill College, Storey's Way, Cambridge. Another great training day packed full of lots of new and exciting data technology.
I attended some amazing sessions. These were
- Data Classification in SQL Server and Azure SQL Database - Mark Pryce-Maher
- Azure Cosmos DB - What you need to know to build globally distrib - Satya SK Jayanty
- Using AI to write session submission to SQLSaturday - Terry McCann
- Power BI server and Office Online server, modernize your on-premises BI approach. - Isabelle Van Campenhoudt
- Bricking it: An Introduction to Azure Databricks - Simon Whiteley
- SQLOpsStudio Vs SSMS - There can be only one - Warwick Rudd
- Branding Yourself for a Dream Job - Steve Jones
There was some very interesting learning, that I took away, which I will write about in a follow up blog.
Saturday, 8 September 2018
SQL Relay: the travelling conference
Registration is open for SQL Relay. This is a conference with a difference. It tours round the UK for 5 consecutive days bringing international and MVP speakers to your local area. It is an amazing conference to attend with lots of sessions to choose from. This years events are at
Newcastle - Monday 8 October
https://sqlrelay2018-newcastle.eventbrite.co.uk/
Leeds - Tuesday 9 October
https://sqlrelay2018-leeds.eventbrite.co.uk/
Birmingham - Wednesday 10 October
https://sqlrelay2018-birmingham.eventbrite.co.uk/
Reading - Thursday 11 October
https://sqlrelay2018-reading.eventbrite.co.uk/
Bristol - Friday 12 October
https://sqlrelay2018-bristol.eventbrite.co.uk/
Subscribe to:
Posts (Atom)