Welcome

Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein



Friday, 14 October 2011

Big Data – What is the Big Deal?

The 3rd SQL PASS Keynote was given by David J. DeWitt of the Data and Storage Platform Division. IT was a brilliant insightful session.

The session started explaining the definitions of Big Data. It is a massive collection of records. To some, “Big Data” means using a new a NoSQL system like Hadoop and Map Reduce or the old traditional parallel relational DBMS to manage the data. Data is the currency of this generation with the realization that data is too valuable to delete.

NoSQL
Not Only SQL - It's about recognizing that for some problems other storage solutions are better suited. NoSQL has a flexible data model, faster time to deliver, relaxed consistency model such as eventually consistent, the willingness to trade consistency for availability, low upfront software costs. Some data is just not worth storing in a relational databases, validating, cleansing, ETL, analyzing or controlling the quality.

There are 2 types of NoSQL:

Key/Value Stores
Examples: Mongo, CouchBase, Cassandra, Windows Azure.
This is single value retrievals based on key - Think NoSQL OLTP.

Hadoop
This is large volumes of data stored in a distributed file system - Think NoSQL data warehousing.

SQL is sometimes termed 'schema first' and NoSQL 'schema later'.

The other idea that was presented throughout the session was the idea that there are two universes in the new reality. Structured Vs Unstructured.


This is not a paradigm shift. The world has changed and the new reality is the RDBMS and NoSQL databases need to work together to address the current requirements in a complementary fashion.

The rest of the session went on to explain about Hadoop and its ecosystem and how the 2 technologies work together.

Hadoop = HDFS (file system store) + MapReduce (programing paradigm, process)

Some applications need data from both universes in the new world. Where this is the case Sqoop is used to connect the Unstructured (Hadoop) to Structured (RDBMS).



Thursday, 13 October 2011

The fantastic 12 of SQL Server 2012

Day 2 of the SQL PASS Summit keynote was given by Quentin Clark, Corporate Vice President, SQL Server Database Systems Group and it covered these 12 sections

Summary slide from the keynote 

Big Data
Microsoft has partnered with Hortonworks to provide an Hadoop based Windows Azure service which will be out before the end of the year. "The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model."
ODCB Drivers for Linux and Change Data Capture for SSIS and Oracle
Interoperability new drives for PHP, Java and Hadoop  

SQL Azure changes
Backing up SQL Azure databases was shown in the demo
SQL Azure will support 150GB databases and any collation by the end of the year
The Azure management portal uses the Metro UI
SQL Azure federations to scale

Appliances
Microsoft SQL Server appliances and reference architectures allow customers to deploy data warehouse (DW), business intelligence (BI) and database consolidation solutions in a very short time, with all the components pre-configured and pre-optimized. The Fast Track Data Warehouse 3.0 Reference Guide and blueprint is http://msdn.microsoft.com/en-us/library/gg605238.aspx . The three appliances are
  • HP Business Decision Appliance (BDA)
  • Microsoft and HP Business Data Warehouse appliance (BDW)
  • Fast Track Data Warehouse

Wednesday, 12 October 2011

SQL Server 2012

The Day One Keynote Wednesday 12 October for SQL Pass keynote was given by Ted Kummert, Senior VP, Business Platform Division, Microsoft Corp PASS Summit 2011

The keypoints annouced were:-

The name for SQL Server Denali is SQL Server 2012.

SQL Server Data Tools is the name for Project Juneau. It brings a new paradigm for database development within the familiar toolset of Visual Studio for T-SQL development.

Power View is the name for Project Crescent. It provides an ad-hoc reporting tool for business users. 

SQL Server 2012 is to support Hadoop. Oracle and EMC will also release a distribution of Hadoop and the Greenplum database and Hadoop will run on the same platform. 

The Hive ODBC driver will be available next month

Data Explorer was announced. The Data Explorer tool allows you to browse data sources in the cloud.  Has phases of discovery, enrichment and sharing. It can handle Big Data joining multiple data sets from say the cloud, excel, market place, sql etc.

The PASS Summit 2011 Live Keynote Streaming is
http://www.sqlpass.org/summit/2011/Live/LiveStreaming.aspx

Taken from the SQL Server UK Tech Days, the features include


Wednesday, 5 October 2011

Database-as-a-Service

Database as a service providers are rapidly increasing. These are a few I found

SQL Azure

SQL Azure delivers cloud database services which enable you to focus on your application, instead of building, administering and maintaining databases. It is built on SQL Server technologies and is a component of the Windows Azure platform.
http://www.microsoft.com/windowsazure/sqlazure/

VMware vFabric Data Director
The first database supported on Data Director is VMware vFabric Postgres 9.0 (vPostgres),..
http://www.vmware.com/products/datacenter-virtualization/vfabric-data-director/overview.html

Project RedDwarf – Database as a Service
http://www.openstack.org/blog/2011/04/announcing-project-reddwarf-database-as-a-service/

Amazon Relational Database Service (Amazon RDS)
Amazon RDS automatically patches and backs up your database, storing the backups for a user-defined retention period and enabling point-in-time recovery.

You benefit from the flexibility of being able to scale the compute resources or storage capacity associated with your relational database instance via a single API call. Amazon RDS supports both MySQL and Oracle Databases
http://aws.amazon.com/rds/

Sunday, 2 October 2011

SQLBits 9 Query across the Mersey


Held at the Britannia Adelphi hotel in Liverpool SQLBits 9 was yet another amazing conference. It was a privilege to be an official helper for this event. The 3 day event started with conference organization on the Wednesday evening followed by the training day seminar. I attended Upgrading your DBA Skills to SQL Server Denali with Christian Bolton of Coeo. The session covered various parts of the Box,  on premise server, element in the Denali strategy.  By day 2 the heat in the hotel was overpowering due to the heatwave. The sessions on Friday and Saturday covered an array of talks across the database landscape.  The keynote on the Friday covered the ecosystem of the Appliance strategy. The third part of the Denali strategy Cloud was not covered during this event.
Here are some observations on key topics of interest during the event.

There was a significant buzz about the up and coming release of Denali, the next version of SQL Server (as announced at SQL Pass yesterday is SQL Server 2012) which will ship in the first half of next year.

The Microsoft strategy for SQL Server consists of Box (on premise server), Cloud (SQL Azure) and Appliance (prebuilt and preconfigured servers). There are a few key items which will no longer be supported. It will not be possible to upgrade from SQL Server 2005 in one go and Data transformation Services (DTS) will not be supported.

Further advancements in the business intelligence (BI) arena will provide an end to end enterprise information management platform.  The BI suite consists of enhancement to master data services (MDS), Integration Services (SSIS), a new tool for Data Quality Services (DQS) and Impact analysis and lineage tool (currently project Barcelona).  However, it is unlikely that project Barcelona will ship in the initial release of Denali. In addition to this, significant improvements have been made to reporting services, under project crescent, to provide a visual design experience and to revolutionise query performance using the new column store for OLAP cubes.

For database administrators the upgrade and planning tools for installing database servers and migrating databases have been significantly overhauled so that it is possible to install the latest versions of the software and replay workflows to aid in migration and testing. In the service environment, to leverage to ability to monitor event data for troubleshooting SQL Trace / Profiler will be deprecated and extended events will replace that functionality.

Denali will introduce further security improvements and changes to facilitate compliance and increase flexibly and management.  These are just a few of the changes in addition to the changes to high availability and the disaster recovery solutions. Always on, a new component, provides improvements to high availability using a mixture of 2 existing technologies, mirroring and clustering, and can be across geographically dispersed locations with the ability to add multiple replicas.
The development process will be set to change at a later date with Codename “Juneau”.

In summary, a vast amount of new technological components were discussed, from the new HP / Microsoft appliances to NoSQL databases to existing SQL Server components. Denali will be a major release of SQL Server, the largest since SQL Server 2005.  The main takeaway is that SQL Server is no longer the small database server application that doesn’t scale or perform.