Tuesday 29 June 2021

Data Culture

Data culture is a term that has been talked about over many years. It is about using data to drive an organizations decisions.  Mckinsey state there are seven principles that underpin a healthy data culture

  • data culture is a decision culture
  • data culture is a C-Suite imperative, and that of the board
  • the democratization of data
  • data culture puts risk at its core
  • culture catalysts with people bridging data science and on the ground operations
  • sharing data beyond company walls is shifting for in house competitive advantage to assembling the breath of best data assets in the market
  • marrying talent and culture 

Alation have started doing quarterly State of Data Culture reports . The latest report is June 2021. Within that report they are sharing a Data Culture Index (DCI) which is a quantitative assessment of how well organisations are positioned to enable data driven decision making. The index they have is based on data search and discovery, data literacy and data governance. I do think that data culture is also about data ethics.

The report states the top initiative to foster data culture is managing data governance and improving that data quality. 

Monday 28 June 2021

Distinguished Engineer and Research Fellows

I have always been fascinated by the job roles distinguished engineer and research fellows. To me it seems these role transcend research and industry. A distinguished engineer is a title applied to someone who is thought (by those conferring this title) to have achieved noteworthy technical, professional accomplishments while working as an engineer.  A research fellow is an academic research position at a university or research institution that is usually held by academic staff or faculty members. 

I came across a slide show on behaviours and qualities of an IBM Distinguished Engineer. An interesting quote is on the diagram "I want to be distinguished from the rest; to tell the truth, a friend to all mankind is not a friend for me" 

Distinguished Engineer IBM Fellows are world famous inventors and theorists. A Distinguished Engineer has a unique and fascinating job that transcends many boundaries. A few of the attributes they mention:
  • Eminence takes responsibility 
  • Be learned, erudite 
  • Integrity and trust in all things
  • Learn from your mistakes
  • Apply common sense
  • Make decisions
  • Have a point of view
  • Provide hope
  • Inspire others
  • Be collaborative
  • Be optimistic and cheerful
  • Adapt proactively
  • Be curious and fearless
  • Build a track record and keep notes!
  • Know yourself and be true to you
  • Enhance your communication skills and image
  • Listen actively
  • Be a mentor and coach
  • Lead diverse teams
  • Be a member of a professional body 

Tuesday 22 June 2021

Combining research and industry learning

I am very privileged to have an article about my career published in the ACM journal. Computing enabled me to... obtain a PhD. and a Career in Data.

The DOI reference for my paper is https://doi.org/10.1145/3464919 .

Communications of the ACM Volume 64 Issue 7 pp 7

About the journal

ACM, the world's largest educational and scientific computing society, delivers resources that advance computing as a science and a profession. ACM provides the computing field's premier Digital Library and serves its members and the computing profession with leading-edge publications, conferences, and career resources. They see a world where computing helps solve tomorrow’s problems – where we use our knowledge and skills to advance the profession and make a positive impact.


Being part of the research world is a huge part of who I am and it is very important to have research and industry working together to help shape the future of data innovation.

Wednesday 16 June 2021

Data Toboggan Cool Runnings 21 Summary

The 12 hour event ran on 12 June. We used Teams Live Events. The tool has evolved since we used it in January with some plus points and some behaviour we hadn't seen before. There were some amazing speakers from around the world: Finland, Malta, Australia, Hungary, Serbia, UK (including Scotland), The Netherlands, USA (Seattle, New Mexico), China, (Shanghai)  Canada and Norway. 

It was amazing having Robin Sutara, the Microsoft UK Chief Data Officer present the Keynote. We were also privileged to have a number of product group speakers sharing in depth details about the Azure Synapse features. Then there were community speakers and MVPs speaking. All in all a lot of content with the 45 minutes main sessions wrapped with 5 minute pre recorded lightning talks.  

We had our first expert panel discussion session which I found really interesting and informative. People were able to ask questions in advance using Microsoft forms and during the event using Slido. I would like to thank the people who watched the presentations live on the day and contributed to such a lively chat on Slack. 

I was interested to see where our twitter followers are located. It is interesting to see where Azure Synapse is being used or investigated.

There were many discussions to be had by people on the slack channel. It is nice to have free open discussions on Azure Synapse all day. The links shared in the slack channel were

Presentation tricks : http://blogs.lobsterpot.com.au/2021/01/30/presentation-trickery-online-glassboard-like-lightboard-but-using-just-free-software/ 

Set the utf 8 collation after the dB has been created described by Jovan here https://techcommunity.microsoft.com/t5/azure-synapse-analytics/always-use-utf-8-collations-to-read-utf-8-text-in-serverless-sql/ba-p/1883633

Learn more about distributed execution flow that serverless uses, you can read it in this VLDB article: https://www.vldb.org/pvldb/vol13/p3204-saborit.pdf

Andy C Slides for Turbocharge here https://www.datahai.co.uk/power-bi/turbocharge-power-bi-using-azure-synapse-analytics-session/

Using file metadata in queries : https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-specific-files

Best Practices for serverless SQL pool https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-serverless-sql-pool

Article that explains the cost management for serverless pools in details: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/data-processed

Synapse Link for Dataverse: https://docs.microsoft.com/en-us/powerapps/maker/data-platform/export-to-data-lake

Roman Pijack: Cost control for serverless SQL pool - the source code of the views I'll be showing during the lightning talk is available   https://github.com/RomanPijacek/DataToboggan/tree/main/CostControlForServerlessSqlPool

The official MS doc that describes Cost management for serverless SQL pool: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/data-processed

Craig Porteous: github repo  for his session 


Data Profiler Summary Stats in ADF Data Flows: https://techcommunity.microsoft.com/t5/azure-data-factory/how-to-save-your-data-profiler-summary-stats-in-adf-data-flows/ba-p/1243251

Data Factory to Synapse user voice: https://feedback.azure.com/forums/307516-azure-synapse-analytics/suggestions/41026642-allow-to-share-a-self-hosted-ir-from-data-factory

Well-Architected Framework session slides  https://www.datahai.co.uk/synapse-analytics/applying-the-azure-well-architected-framework-to-azure-synapse-analytics-session/

Mark PM: SQLPackage.exe https://docs.microsoft.com/en-us/sql/tools/sqlpackage/sqlpackage?view=sql-server-ver15 

SqlPackage for Azure Synapse Analytics - SQL Server https://docs.microsoft.com/en-us/sql/tools/sqlpackage/sqlpackage-for-azure-synapse-analytics?view=sql-server-ver15

Wolfgang: First preview this Summer https://devblogs.microsoft.com/visualstudio/visual-studio-2022/

From Drew Skwiers-Koballa  "Build a Data Warehouse with SQL Database Projects": https://github.com/dzsquared/synapse-sqlproj-demo

Finishing with the Azure Synapse Analytics Blog: https://techcommunity.microsoft.com/t5/azure-synapse-analytics/bg-p/AzureSynapseAnalyticsBlog 

Tuesday 8 June 2021

Data Toboggan - Cool Runnings line up

We have an amazing line up of speakers and subject matter experts. 

We have a a lot of different elements with the summer edition of data toboggan.

  • Our keynote will be delivered by Robin Sutara the Microsoft UK Chief Data Officer. I'm really excited to have the opportunity to listen to Robin kick off out event  
  • Ask the Experts panel from Microsoft and the Community
  • A Cloud Skills Challenge on Azure Synapse between 1st June 2021 and ends 23rd June 2021. The Cloud Skills Challenge is here for UK residents and the terms and conditions are available here 
  • Members of the Microsoft Product group sharing their knowledge of the latest features and more
  • MVPs and community members who are passionate about Azure Synapse
  • Lightning talks for a light break between sessions. We have some fun sessions, short topics and interesting research talks.  
  • And most importantly you the community attending and sharing in our excitement for Azure Synapse.

We hope you enjoy the day around the globe on Saturday 12 June 8am (BST) to 8pm (BST).  

Register now: http://bit.ly/DT-CRRegister 

Agenda: https://bit.ly/DT-CRAgenda

#AzureSynapse @Azure_Synapse #MVPBuzz

Read: Azure Synapse: A Single Pane of Glass

Friday 4 June 2021

Data Toboggan - Ask the Experts

Data Toboggan are really excited to have an Ask the Experts from Microsoft and the Community session. Submit your questions now http://bit.ly/DTAsktheExperts Register http://bit.ly/DT-CRRegister

The experts

Wee Hyong Tok (@weehyong), Principal Group Program Manager at Microsoft Shirley Wang, Principal Group Program Manager at Microsoft Mark Kromer (@KromerBigData), Principal Program Manager at Microsoft Abhishek Narrain (@narainabhishek), Senior Program Manager at Microsoft Linda Wang, Program Manager at Microsoft Leslie Andrews (@landrews5807), Lead Data Architect at 3Cloud Rayis Imayev (@RayisImayev), Senior Data Engineer, Global Investments at OMERS Kamil Nowinski (@NowinskiK), Group Manager & Analytics Architect at Avanade UK

See the post by Cathrine Wilhelmsen. https://bit.ly/3fO9aO7

To take part in an amazing day of sessions for 12 hours of Azure Synapse on Saturday 12 June r
egister now: http://bit.ly/DT-CRRegister

Agenda: https://bit.ly/DT-CRAgenda

Tuesday 1 June 2021

Data Toboggan Cloud Skills Challenge

Data Toboggan Cool Runnings has it next 12 hour conference on 12 June. To run along side our event we have teamed up with Microsoft to run a Cloud Skills Challenge on Azure Synapse. The details are below.  

Challenge Dates: 1st June 2021 and ends 23rd June 2021

The Cloud Skills Challenge is here: https://aka.ms/Challenge/DataToboggan

The terms and conditions are available here: https://aka.ms/DataToboggan/terms-conditions

Whilst our conference welcomes international speakers and attendees, please only enter  the cloud skills challenge if you are a UK resident. 

Whilst on the learning path please sign up for the conference to help you gain those new skills along the way

Register now: http://bit.ly/DT-CRRegister

Agenda: https://bit.ly/DT-CRAgenda