Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Friday, 9 November 2018

SQLBits 2019: The Great Data Heist

SQLBits 2019 registration is open. Next year it runs between 27 February 2019 and 02 March 2019 at Manchester Central. There are many amazing reasons to attend this data conference. Hope to see you there.

Thursday, 8 November 2018

PASS Summit 2018 Day 2 Keynote

The day 2 keynote today was given by Microsoft Data Platform CTO Raghu Ramakrishnan on the internals of our next evolution in engine architecture which will form the foundation for the next 25 years of the Microsoft data platform.














It covered Azure SQL DB Hyperscale. The changing landscape of data has many challenges. How to leverage unbounded storage and elastic compute as well as the perennial problems: size of data operations are slow with long painful recovery times while masking network latencies.




















There isn’t one database system that can do it all well. Users need to move data across systems which is slow and complicates governance.




















The most challenging for state management is ACID properties, transactional updates, high velocity of data changes and lowest response times. These issue lead to the SQL Hyperscale.













There are various technical themes: full separation of compute and storage, the quorum (log) is complex, uniquely skewed access pattern and network simply extends the memory hierarchy. He shared a newsflash about Multi-Version Timestamp CC rules resulting from 2 phase locking and MVCC (Hekaton) and lock free data structures.

Persistent Version Store (PVS)
This technical in depth talk was packed full of technical content about SQL Hyperscale and I would recommend watching the recording about this new product and era of database delivery.

Wednesday, 7 November 2018

PASS Summit 2018 Keynote Day 1












The first keynote of PASS summit was delivered by Rohan Kumar entitled SQL Server and Azure Data Services: Harness the ultimate hybrid platform for data and AI





Customer priorities for a modernized data estate are: modernizing on-premises, modernizing to cloud, build cloud native apps and unlocking insights.






The announcements follow:

SQL Server 2019
SQL Server 2019 Public Preview  is a great way to celebrate the 25th anniversary of SQL Server

There is the introduction of big data clusters which combines Apache Spark and Hadoop into a single data platform called SQL Server. This combines the power of Spark with SQL Server over the relational and non-relation data sitting in SQL Server, HDFS and other systems like Oracle, Teradata, CosmosDB.

There are new capabilities around performance, availability and security for mission critical environments along with capability to leverage hardware innovations like persistent memory and enclaves.

Hadoop, ApacheSpark, Kubernetes and Java are native capabilities in the database engine.

Accelerated data recovery (ADR) was demonstrated and is incredible. It is at public preview.  The benefits of ADR are
  • Fast and consistent Database Recovery
  • Instantaneous Transaction rollback
  • Aggressive Log Truncation

Azure HDInsight 4.0

HDInsight 4.0 is now available in public preview.

There are several Apache Hadoop 3.0 innovations. Hive LLAP (Low Latency Analytical Processing known as Interactive Query in HDInsight) delivers ultra-fast SQL queries. The Performance metrics provide useful insight.

Integration with Power BI direct Query, Apache Zeppelin, and other tools. To learn more HDInsight Interactive Query with Power BI.

Data quality and GDPR compliance enabled by Apache Hive transactions
Improved ACID capabilities handle data quality (update/delete) issues at row level. This means that GDPR compliance requirements can now be meet with the ability to erase the data at row level. Spark can read and write to Hive ACID tables via Hive Warehouse Connector.

Apache Hive LLAP + Druid = single tool for multiple SQL use cases

Druid is a high-performance, column-oriented, distributed data store, which is well suited for user-facing analytic applications and real-time architectures. Druid is optimized for sub-second queries to slice-and-dice, drill down, search, filter, and aggregate event streams. Druid is commonly used to power interactive applications where sub-second performance with thousands of concurrent users are expected.

Hive Spark Integration
Apache Spark gets updatable tables and ACID transactions with Hive Warehouse Connector

There are several Apache Hadoop 3.0 innovations. Hive LLAP (Low Latency Analytical Processing called Interactive Query in HDInsight) for ultra-fast SQL queries. The Performance metrics provide useful insight.

Integration with Power BI Direct Query, Apache Zeppelin, and other tools. To learn more watch HDInsight Interactive Query with Power BI.

Better data quality and GDPR compliance enabled by Apache Hive transactions
Improved ACID capabilities handle data quality (update/delete) issues at row level. GDPR compliance requirements can now be meet with the ability to erase the data at row level. Spark can read and write to Hive ACID tables via Hive Warehouse Connector

Apache Hive LLAP + Druid = single tool for multiple SQL use cases

Druid is a high-performance, column-oriented, distributed data store, which is suited for user-facing analytic applications and real-time architectures. Druid is optimized for sub-second queries to slice-and-dice, drill down, search, filter, and aggregate event streams. Druid is commonly used to power interactive applications where sub-second performance with thousands of concurrent users are expected.

Hive Spark Integration
Apache Spark gets updatable tables and ACID transactions with Hive Warehouse Connector.



















Apache HBase and Apache Phoenix
Apache HBase 2.0 and Apache Phoenix 5.0 get new performance and stability features and all of the above have enterprise grade security.

Azure
Azure event hubs for Kafka is generally available
Azure Data Explorer is in public preview.

Azure Databricks Delta is in public preview
  • Connect data scientist and engineers
  • Prepare and clean data at massive scales
  • Build/train models with pre-configured ML

Azure Cosmos DB multi master replication was demoed with a drawing app, Azure Cosmos DB PxDraw
Azure SQL DB Managed Instances will be at General Availability (GA) on Dec 1st. This provides Availability Groups managed by Microsoft.

Power BI
















The new Dataflows is an enabler for self-service data prep in Power BI

Power BI Desktop November Update
  • Follow-up questions for Q&A explorerIt is possible to ask follow-up questions inside the Q&A explorer pop-up, which take into account the previous questions you asked.
  • Copy and paste between PBIX files
  • New modelling view makes it easier to work with large models.
  • Expand and collapse matrix row headers


Friday, 2 November 2018

Future Decoded Day 2

Live stream updates had this  great picture summary of the keynote.

The Day 2 Keynote at Future Decoded by Satya Nadella was inspiring. He talked around this simple self-evident formula Tech intensity = (Tech adoption) ^ Tech capability and the Intelligent Cloud and Intelligent Edge in an era of digital transformation.

In any society you need three actors for growth. You need government, academia and entrepreneurs & the private sector.

The core areas to consider and build on in the future

Privacy
We need to protect privacy as a basic fundamental human right. Trust and GDPR are important to achieve this.

Security
We need to act with collective responsibility across the tech to help keep the world safe. Cyber Security threat detection and removal are core to have embedded in any platform. Microsoft have been leading the Tech Accord.

Ethical AI 
We need to ask ourselves not only what computers can do, but what computers should do

Thursday, 1 November 2018

Future Decoded The AI Future

Future Decoded in London ExCel 31 October - 1 November is an exciting place to be. The event is packed full of AI innovations.  AI is groundbreaking and will change the face of the market place. It needs substantial learning for business and people to maximise its capability. Three takeaways from today

Maximising the AI Opportunity



Artificial intelligence is changing the UK so fast that nearly half of today's business model won’t exist by 2023, a new Microsoft report has revealed. The article can be read here
UK companies at risk of falling behind due to a lack of AI strategy, Microsoft research reveals
and the report  Maximising the AI Opportunity shares insights on the potential of AI - including Skills & Learning - based on a survey & interviews with 1000s of UK leaders

Microsoft AI Academy
A new addition to Microsoft's commitment to advancing Digital skills in the UK, the Microsoft AI academy will run face-to-face and online training sessions for business and public sector leaders, IT professionals, developers and start ups.

aka.ms/learn

Microsoft Research and Cambridge University

Some amazing news from Microsoft is that is partnering with the University of Cambridge to boost the number of AI researchers in the UK.The Microsoft Research-Cambridge University Machine Learning Initiative will provide support for Ph.D. students at the world-leading university, and offer a postdoctoral research position at Microsoft Research Lab, Cambridge . Our aim is to realise artificial intelligence’s potential in enhancing the human experience and to nurture the next generation of researchers and talent in the field.

Read More:
Microsoft Research and Cambridge University strengthen their commitment to AI innovation and the field’s future leaders