Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Thursday, 13 May 2010

SQL Server 2008 R2

SQL Server 2008 R2 was released to Released to Manufacturing on 21 April 2010.

The official press release summarizes the main benefits as:
1. Managed self-service business intelligence (BI) for reporting and analysis
2. Enterprise-class scalability and greater IT efficiency
3. Platform integration spanning the data centre to the cloud

Tuesday, 9 March 2010

SQL Server Codenames

I was interested to see what all the SQL codenames were

Sphinx - SQL Server 7.0
Plato - SQL Server 7.0 OLAP Services
Shiloh - SQL Server 2000
Liberty - SQL Server 2000 (64-bit)
Rosetta - SQL Server Reporting Services
Yukon - SQL Server 2005
Katmai - SQL Server 2008
Kilimanjaro - SQL Server 2008 R2
Madison - Microsoft SQL Server 2008 R2 Parallel Data Warehouse
Bulldog - SQL Server 2008 R2 Master Data Services (Master Data Management)

Wednesday, 24 February 2010

Database Mail Catalog views

Database Mail Catalog views display metadata about the status of e-mail messages processed by Database Mail.

USE MSDB
select * from sysmail_allitems

Contains a row for each attachment submitted to Database Mail.

USE MSDB
select * from sysmail_mailattachments

Contains returned message details for Windows or SQL Server

USE MSDB
select * from sysmail_event_log

Contains a row for each message sent successfully

USE MSDB
select * from sysmail_sentitems

Contains one row for each failed message

USE MSDB
select * from sysmail_faileditems

Contains row for Messages with unsent or retrying status which are still in the mail queue

USE MSDB
select * from sysmail_unsentitems

Friday, 5 February 2010

SQL Server Version Numbering

This defines the product version numbering format for SQL Server. It is MM.nn.bbbb.rr which is defined as:
MM - Major version
nn - Minor version
bbbb - Build number
rr - Build revision number

Monday, 25 January 2010

Taxonomy

Taxonomy is the science of classification. It comes from the Greek taxis meaning arrangement or division and nomos meaning law. It provides a conceptual framework for discussion, analysis, or information retrieval. It arranges hierarchically, elements of a group (taxon) into subgroups (taxa) that are mutually exclusive. Each group has the same properties, behaviours, and constraints. And each increasing level has further properties, behaviours, and constraints.

In biological terms this would be the classification of species.

Monday, 18 January 2010

Fragmentation

Fragmentation is the storing of data non-contiguously on disk.

Detecting Fragmentation

The level of fragmentation can be determined by running the Dynamic Management View (DMV): sys.dm_db_index_physical_stats.
To return information for all databases detailing all statistics for all tables and indexes within the instance of SQL Server run

SELECT * FROM sys.dm_db_index_physical_stats (NULL, NULL, NULL, NULL, NULL);

A couple of the DMV columns are defined below

avg_fragmentation_in_percent: Logical fragmentation for indexes, or extent fragmentation for heaps in the IN_ROW_DATA allocation unit.. The percent of logical fragmentation (out-of-order pages in the index). The lower this value, the better it is. If this value is higher than 10%, some corrective action should be taken.

avg_page_space_used_in_percent : Average percentage of available data storage space used in all pages. Higher the value, the better it is. A value is lower than 75%, some corrective action should be taken.

Reducing Fragmentation

After the degree of fragmentation is identified from sys.dm_db_index_physical_stats, a guideline on the course of action to be taken is listed below



















avg fragmentation in percentSuggested Action
> 5%No action required
> 5% and < = 30%ALTER INDEX REORGANIZE
> 30% ALTER INDEX REBUILD WITH (ONLINE = ON) can be executed online or offline. A replacement for DBCC DBREINDEX

Friday, 20 November 2009

Filestream

There are various types of data
  • Structured Data is data which is stored in a relational database
  • Semi Structured Data is often of xml type and can be stored in the database
  • Unstructured Data is such as images, video etc
Unstructured Data or binary large object (BLOB) data needs to be reviewed to decide where it is stored. BLOB’s smaller than 256kb are better stored inside a database and BLOB’s larger than 1 MB, for filestream, are best stored outside of the database. Storing blobs in the database using the data type varbinary(max) is limited to 2GB per blob.

Filestream allows unstructed data to be stored in the NTFS file system in directories called data containers which are listed as filegroups in the database. This allows transactional consistancy between structured and unstructured data and allows point in time recovery. Deleting or renaming or any filestream files directly in the file system will corrupt the database. Each row has a unique row ID. Fragmentation can occur on the NTFS file system and regular defragmentation is required. Streaming of data through the filestream is more efficent than through the conventional database set up however backing up the database with filestream files is slower than just backing up the database. Each Filestream data container may need to be on it’s own volume to stop data contention. Data stored on the NTFS file system can be compressed but it is expensive if the data is uncompressable. Filestream requires integrated security and filestream cannot block antivirus scanning. If a file is infected it is best to quarantine the file and use DBCC CHECKDB to identify the missing file. Filestream has a garbage collection process to remove files no longer required, which is automatic, when the checkpoint process runs.

Filestream has to be enabled at the windows level before the SQL Server level.

Filestream data can not be encrypted. Database Mirroring does not support filestream although failover clustering does.