Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Friday 20 November 2009


There are various types of data
  • Structured Data is data which is stored in a relational database
  • Semi Structured Data is often of xml type and can be stored in the database
  • Unstructured Data is such as images, video etc
Unstructured Data or binary large object (BLOB) data needs to be reviewed to decide where it is stored. BLOB’s smaller than 256kb are better stored inside a database and BLOB’s larger than 1 MB, for filestream, are best stored outside of the database. Storing blobs in the database using the data type varbinary(max) is limited to 2GB per blob.

Filestream allows unstructed data to be stored in the NTFS file system in directories called data containers which are listed as filegroups in the database. This allows transactional consistancy between structured and unstructured data and allows point in time recovery. Deleting or renaming or any filestream files directly in the file system will corrupt the database. Each row has a unique row ID. Fragmentation can occur on the NTFS file system and regular defragmentation is required. Streaming of data through the filestream is more efficent than through the conventional database set up however backing up the database with filestream files is slower than just backing up the database. Each Filestream data container may need to be on it’s own volume to stop data contention. Data stored on the NTFS file system can be compressed but it is expensive if the data is uncompressable. Filestream requires integrated security and filestream cannot block antivirus scanning. If a file is infected it is best to quarantine the file and use DBCC CHECKDB to identify the missing file. Filestream has a garbage collection process to remove files no longer required, which is automatic, when the checkpoint process runs.

Filestream has to be enabled at the windows level before the SQL Server level.

Filestream data can not be encrypted. Database Mirroring does not support filestream although failover clustering does.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.