Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Friday, 20 November 2009


There are various types of data
  • Structured Data is data which is stored in a relational database
  • Semi Structured Data is often of xml type and can be stored in the database
  • Unstructured Data is such as images, video etc
Unstructured Data or binary large object (BLOB) data needs to be reviewed to decide where it is stored. BLOB’s smaller than 256kb are better stored inside a database and BLOB’s larger than 1 MB, for filestream, are best stored outside of the database. Storing blobs in the database using the data type varbinary(max) is limited to 2GB per blob.

Filestream allows unstructed data to be stored in the NTFS file system in directories called data containers which are listed as filegroups in the database. This allows transactional consistancy between structured and unstructured data and allows point in time recovery. Deleting or renaming or any filestream files directly in the file system will corrupt the database. Each row has a unique row ID. Fragmentation can occur on the NTFS file system and regular defragmentation is required. Streaming of data through the filestream is more efficent than through the conventional database set up however backing up the database with filestream files is slower than just backing up the database. Each Filestream data container may need to be on it’s own volume to stop data contention. Data stored on the NTFS file system can be compressed but it is expensive if the data is uncompressable. Filestream requires integrated security and filestream cannot block antivirus scanning. If a file is infected it is best to quarantine the file and use DBCC CHECKDB to identify the missing file. Filestream has a garbage collection process to remove files no longer required, which is automatic, when the checkpoint process runs.

Filestream has to be enabled at the windows level before the SQL Server level.

Filestream data can not be encrypted. Database Mirroring does not support filestream although failover clustering does.

Thursday, 19 November 2009

Data Collection Security

There are 3 roles


Grants read and write to the Management Data Warehouse (MDW). It allows users to purge and clean up jobs to manage the amount of data
Grants users with read only access
Allows write and upload of data to the MDW

Data Collection

This product is a development of performance dashboard, database reports and built on top of standard DMV's. Historical data is persisted and is not affected by service restarts.

Plan for data growth of up to 250 - 500 MB a day on the Management Data Warehouse (MDW). The default Data Retention Period in MDW is 4 weeks . Microsoft detected approximately 3 - 4% increase in CPU performance when running this application.

The System Data Collection Sets contain 3 types of data collector Disk Usage, Query Statistics and server activity. The data is stored in a Management Data Warehouse (MDW). The data is collected by running SQL Server Agent jobs and have associated SSIS packages.
Disk Usage Data Collection
This captures information about disk usage for data and log files for each database. This can be used to show disk space trends
Query Statistic Data Collection
This collects information about different queries that run on the server and statistics to help in tuning efforts and identify the greatest impact on resource usage
Server Activity Data Collection
Colelcts CPU, Memory and performance counts. Helps identify bottlenecks