Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Thursday 22 August 2019

Microsoft ML for Apache Spark

Microsoft Research announce a new version Microsoft ML for Apache Spark, an open-source and distributed ML and microservice library. v0.18 brings Vowpal Wabbit on Spark, Speech to Text & more!

Microsoft Machine Learning for Apache Spark (MMLSpark) is an ecosystem of enhancements that expand the Apache Spark distributed computing library to tackle problems in Deep Learning. It enables sending streaming data to Power BI.
Website: http://aka.ms/spark Paper: http://aka.ms/spark-paper



Global AI Nights 2019


The Global AI Night is a free evening event organized in London by community people, who are passionate about Artificial Intelligence on the Microsoft Azure. It is at the The Microsoft Reactor in London on Thursday September 5, 2019 5:45 PM – 10:00 PM Register here.

Friday 16 August 2019

Database Trends Awards




Database Trends and applications have names have listed the best relational database and best big data platform.






Best relational database: SQL Server
"According to Craig S. Mullins, president & principal consultant, Mullins Consulting, Inc, relational continues to dominate: IDC forecasts that relational DBs will still account for more than 80% of the total operational database market through 2022, and Gartner forecasts that through 2020, relational technology will continue to be used for at least 70% of new applications and projects."

Best big data platform: Cloudera Enterprise Data Cloud

"To leverage the immense power of their data, organizations need a solid strategy that incorporates everything from security to data governance to the right big data technologies. Enabling both on-prem and cloud deployments—or a hybrid strategy—big data platforms today support data warehouses, data lakes, data science, engineering, machine learning, myriad database management systems, and much more.  And while Hadoop is a key element of big data platforms today, there are also many other open source components, support capabilities, and advanced features that round out a big data platform to give data-driven companies the big data capabilities they need"

Wednesday 7 August 2019

Discover Datasets

There are many thousands of data repositories around the world and to make it easy to access this data Google have launched a Dataset Search service.

https://toolbox.google.com/datasetsearch



This aimed to be a companion of sorts to Google Scholar, the company’s popular search engine for academic studies and reports.

Read more about the service here.

Friday 2 August 2019

The Big Data Problem

The article The real big-data problem and why only machine learning can fix it and video from the MIT CDO conference, Cambridge, MA contains an interesting discussion on why ETL and MDM don't scale and why placing a schema later doesn't deliver usable data. The key is using machine learning to classify and prep data.