Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Sunday, 29 January 2023

Data Toboggan Winter Edition 2023 event summary


Data Toboggan winter edition ran on Saturday 28 January 2023. This was our third year and fifth conference. We started with one track of 12 hours and this is the second conference we have had with two tracks of sessions.  So many amazing speakers and sessions of interest. 

Our speakers were from 14 countries this time
USA 10 speakers; Poland 2 speakers; Finland 1 speaker; UK 2 speakers; Germany 2 speakers; Peru 1 speaker; Denmark 2 speakers; India 1 speaker; Sweden 1 speaker; Netherlands 1 speaker; Canada 1 speaker; Brazil 1 speaker ; Uruguay 1 speaker; New Zealand 1 speaker and Australia 1 speaker.

I was pleased to see there were a lot of questions and chat in the rooms and lots of links shared for future reading material.  To help people read up on the links shared, I list them here:

Azure Room

Data Toboggan - YouTube – Watch, subscribe and like.

Data Toboggan Website for news of what is coming next

Data Toboggan | Conference Meetup group for conference registration

Data Toboggan Slide Preparation | User Group

We are putting together categories of the sessions to help people learn Azure Synapse Analytics Learning - Data Toboggan

We were asked several time do we share the videos on the YouTube channel.? We seek approval of the speakers to publish their sessions, and usually aim to publish them around 6 weeks after the event. We can not guarantee all sessions will be available to watch online.

Other learning https://events.sqlbits.com/2023/training-days

The detective agency https://detective.kusto.io/

Recommendation:  T-SQL Fundamantals by Itzik Ben-Gan as an awesome reference

Doing Power BI The Right Way

https://sqlserverbi.blog/presentations/

Analytics Room

Azure Synapse Analytics Migration Guides - Azure Synapse Analytics | Microsoft Learn

Azure Synapse Compatibility Checks - Microsoft Community Hub

Known limitations and issues with Azure Synapse Link for SQL - Azure Synapse Analytics | Microsoft Learn

Dimensional Modeling Techniques - Kimball Group

It is a Data modelling warehouse toolkit, complete star schema reference guide, agile data warehouse design

BEAM Templates — modelstorming.com https://modelstorming.com/templates

Brian Bønk Rueløkke: Slides and code for the Synapse CETAS, Views, Parquet can be found here:

public/Speaks/2023/2023-01-28 Data Toboggan at master · brianbonk/public (github.com)

 Azure Synapse Link for Azure Cosmos DB, benefits, and when to use it | Microsoft Learn

MicrosoftLearning/DP-203-Data-Engineer (github.com)

Azure Synapse Analytics August Update 2022 (microsoft.com)

https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-repair-table.html

Convert databricks noteboooks.dbc into standard .py files - Stack Overflow

Synapse Room

The whitepaper  is good documentation

Azure Synapse Analytics security white paper: Network security

A favourite podcast of ours

Knee-deep in Tech Podcast (kneedeepintech.com)

Certification links:

https://learn.microsoft.com/en-us/certifications/azure-data-fundamentals/

Microsoft Certified: Azure Data Engineer Associate - https://learn.microsoft.com/en-us/certifications/azure-data-engineer/?WT.mc_id=DP-MVP-5004032

Microsoft Certified: Azure Enterprise Data Analyst Associate - https://learn.microsoft.com/en-us/certifications/azure-enterprise-data-analyst-associate/?WT.mc_id=DP-MVP-5004032

Coming soon: The new Azure Enterprise Data Analyst Associate certification

Microsoft ESI -https://esi.microsoft.com/ 

Databricks Academy - https://www.databricks.com/learn/training/login

The exams change regularly, so for any exam it's best to review the study guide. For example here's the DP-203 one;

https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4MbYT

Blog: https://www.KevinRChant.com
GitHub: https://github.com/kevchant

PastPresentations GitHub repository , which you can find at https://github.com/kevchant/PastPresentations

https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-openrowset for difference in parser versions


Comparison table is here: https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/understand-synapse-dedicated-sql-pool-formerly-sql-dw-and/ba-p/3594628

PVLDB Proceedings Template - WORD -> polaris whitepaper

How to use CETAS on serverless SQL pool to improve performance and automatically recreate it - Microsoft Community Hub

Synapse Spark Delta Time Travel - Microsoft Community Hub

Understand Synapse dedicated SQL pool (formerly SQL DW) and Serverless SQL pool - Microsoft Community Hub

An interesting question to think about: which term will dominate in next 5 years? will it be a data lakehouse? or data mesh? 

Monitoring Synapse serverless SQL pool query history

Monitoring serverless SQL ended requests by using Log Analytics    

I  would like to thank all the speakers for their time and amazing content. So grateful to fellow organisers Andy Cutler and Richard Munn and our moderators,  Justin Bird and Shabnam Watson. And thank you to our attendees for making the day, such a fun day. We are already ready exciting about the next one in the summer.

Wednesday, 18 January 2023

Data Toboggan 2023 Winter Edition

This is our third year of running Data Toboggan. We are incredibly grateful for all the speakers and attendees who have made this happen. We are pleased to share the agenda. 

Register: https://bit.ly/DT23Register

Agenda:  https://bit.ly/DT23Agenda

Date: Saturday 28th January 2023




Tuesday, 10 January 2023

Best Practices for using the business glossary in Microsoft Purview


There are many terms that are used within a business and for business to function well it is important to have common terms. There are a few things to think about:

Preparation Standards

  • Create a hierarchy of business terms
  • Have a naming standard for the glossary. Terms are case sensitive in Purview and allow white space
  • Check for Duplicate terms
  • Avoid duplicate term names in different parent terms

Term Templates

Using the term template sample.csv file is recommended when importing terms, although a custom template may be used.

Terms updated that already existed will be over written during import, so it is best to test this in a lab environment.

Steward and Experts need to use their Azure Active Directory emails.

Disaster recovery should be considered and exporting the glossary terms could be useful for that.

Multiple Business Glossaries

Multiple business glossaries in Microsoft Purview  were introduced  in December 2022. You can read the article here: Microsoft Purview now supports multiple business glossaries

There are some advantages to this practice.

  • To enable different parts of the business to manage the terms with ease
  • Where different parts of the business have different needs for a glossary

Glossary terms are not automatically applied to assets. They can be added manually, by bulk edit mode, for up to 25 terms,  or curated code using the Atlas API.

There are various glossary term status:

  • Draft – not officially implemented
  • Approved – current term
  • Expired – no longer used
  • Alert – required attention

More details about best practice can be found here

Sunday, 1 January 2023

Happy New Year

Happy New Year!

Start the new year with speaking at Data Toboggan 2023. 

CFS is Open : https://sessionize.com/data-toboggan-2023

Event date: 28 January 2023