Passionately curious about Data, Databases and Systems Complexity. Data is ubiquitous, the database universe is dichotomous (structured and unstructured), expanding and complex. Find my Database Research at SQLToolkit.co.uk . Microsoft Data Platform MVP

"The important thing is not to stop questioning. Curiosity has its own reason for existing" Einstein

Monday 30 July 2018

Systems | Fueling future disruptions

It is an exciting time of year with so many informative and knowledge sharing conferences from Google to Microsoft. It is the 2 day Microsoft Faculty Summit 2018, 1 - 2 August 2018. This brings together leaders and researchers from the broad systems research area in computer science. Systems research is the foundation innovation grows from. It has the potential to disrupt the future.

The conference guide can be downloaded

Saturday 28 July 2018

What is Best Practice?

Best practice is a pervasive term that means different things to different people. Best practice has been defined in various ways (Dembowski 2013; Wellstein & Kieser 2011; Sanwal 2008). Dani et al. (2006) stipulate “A best practice is simply a process or a methodology that represents the most effective way of achieving a specific objective”. Jarrar & Zairi (2000) state that the term best practice is often used within organizations to depict leadership and is recognised as the best way to achieve superior results. In the glossary of benchmarking terms (American Productivity and Quality Centre 1999) cited in (Jarrar & Zairi 2000, p.S734) best practices were defined “Those practices that have been shown to produce superior results; selected by a systematic process; and judged as exemplary, good, or successfully demonstrated. Best practices are then adapted to a particular organisation”. Many different situations require different best practices and with new technology evolving ‘best’ is a moving target (Jarrar & Zairi 2000). 

Markus (2011, p.4) argued that the cultures and practices that develop over time in organizations have changed to become “off-the-shelf” services labelled best practice standards, which organizations needed to adopt and understand. Markus argued the change from unique coded management ideas for handling packages to standard software with relentless upgrades requires knowledge development and standard practices.

Sanwell (2008) stated that the use of best practices are affected by certain beliefs:
  • Best practices help make decisions quickly in a complex uncertain world. 
  • Best practices are easier because they have been proven by other organizations who also operate with complex and uncertain elements. 
  • Management understanding of other organizations in the field are organizational specific. Best practices are often developed later and often already behind leading organizations. 
  • Value must be gained from best practices as other experts, consultants and vendors share them for current trends. 
  • Best practices can improve performance. 

Falconer (2010) argued to the contrary that best practice exacerbates failure:
Best practice is flawed because it acts as a placeholder for proper management practice, displacing accountability for effectiveness and fit. Best practice is flawed, further, because it supplants strategy, adopting solutions out of convenience or copying them reactively, and supplants innovation, allowing “the best we know about”, “the best we’ve come across”, or even “the best we’ve done before” to be adequate. Best practice considers the world predictable, and discounts the emergence of better, novel ideas(Falconer 2010, p.754)
Falconer thought that problem situations are being incorrectly handled due to best practices replacing analysis.

Sanwell (2008) pointed out that changing these best practices in the multidimensional world requires consideration of organizational culture and behaviour, organization processes and organizational systems. As Gonnering stated,
“Best Practices” can serve as a beginning but adaptation will most likely be necessary. Outcome is an emergent property, and the organization that has taken the time to learn the methodology of improvement will reap the benefits. The “continuous” in “continuous quality improvement” depends upon rapid-cycle, small-scale serial innovation and not a static and dogmatic adherence to past processes.” (2011, p.100) 

Gonnering argued that complex problems using best practices failed to have positive outcomes and forced the complex systems to become chaotic. Bretschneider et al. (2004) highlighted three important characteristics of best practice: a comparative process, with action, and linked to an outcome or goal. Nattermann (2000) suggested best practice might be the most widely used management tool in business and important for improving operational efficiency, but for strategic decision making, best practices might not be the best way forward to increase profit margins. Best practices management could be used to benchmark performance, with certain benchmarks being required to demonstrate best practices.

The core or classic best practices utilised within the database community have been developed through the sharing of knowledge, experience and actual outcomes across the sector. The improvement of these best practices were raised by Gratton & Ghoshal (2005) with the term “a signature process”, a process that envelops the company’s character and idiosyncratic nature. This signature process could advance the company although it required careful adaptation and alignment to business goals to succeed. However the allure of classic best practices that were clear, logical and easy to understand were the ones shared within the database community, the body of knowledge often yielding optimal results (Tucker et al. 2007). Some best practices were tightly coupled with their organizations and inseparable from the context (Becker 2004).

Jarrar & Zairi (2000) identified three types of best practice: proven best practice across organizations, good practice techniques for an organization, and unproven good ideas based on intuition. There were drawbacks with unproven ideas that could be a matter of luck and the lack of information to reduce the risk, lack of situational context, application criteria or success measure (Falconer 2011). This serendipitous discovery could lead to ease of deployment and innovation.

The Cynefin framework (Snowden & Boone 2007) classified and ordered simple systems in the domain of best practices. In an earlier paper in the chaos domain Kurtz & Snowden (2003) argued that applying best practices probably caused the chaos in the first place. They argued that different contexts use different management responses and that there are different tools for the management of complex contexts. The best practices domain is based on cause and effect relationships that have simple contexts, often within areas that do not change frequently.

Wagner and Newell (2011, p.400) stated that “The best way of operationalizing a process in one context and at one point in time may be different in another context and time”. They contended that there is no such thing as best practice, as knowledge is created by engagement in a practice. Practice is always changing and emergent with inconsistencies in the same practice, with best practice being defined locally. 

Wagner and Newell (2011, p.401) suggested a move to negotiated practice with a cooperative approach to best practice adoption. Their aim was to smooth out complex implementation through compromise. They concluded that highlighting problems with identifying best practice (due to it being an interactive process based on learning through implementation with information systems) sometimes required customisation to work well. This approach was also adopted by Avgerou & Land (1992) with their notion of ‘appropriate’ context specific practice, where information systems innovation looked for “best practice, or suitable new organizational form for the information age”  (Avgerou 2011, p.650).

Avgerou drew together organizational and information systems to develop a framework which had one key tenet of a knowledge management system or a best practice solution to help address static and commoditized technology.

Best practices and procedures were continually developed by database software providers (e.g. Microsoft, Oracle and MongoDB) to enable the management of database systems to be carried out to the highest standards. The procedures were based on formal rules the business world defined which were sometimes called standard operating procedures (Becker 2004). Best practices were defined by the software providers as exemplary tested designs for certain configurations or ways of doing things. They were multi-faceted and resided in varying layers from architectural design, through development, to operational management.

The management of database systems utilizes best practices and procedures provided by software providers and often industry best practices shared by the community. McGregor (2007) argued that this rarely leads to great customer service. McGregor’s (2007) idea that “Next Practice” was the future of continually analysing and looking for positive quality products and service in other organizations, would bring ideas and innovation to improve the business. There was an aspiration to improve database management and improve business processes to provide good quality service when managing IT projects and database systems. Best practices might not however be the best solution. Sanwell (2008) raised some key issues with using processes and strategies created by other organizations, and did not believe that following these would create a better organization or bring about improvement.

Within database systems there are various types of practices and procedures that need to be incorporated within change processes. Savage (2014, p.17) stated Stonebraker thought “in memory” database engines will take over online transactional processing systems (OLTP). Savage (2014, p.16) shared Stonebraker’s views on the database world, that it could be divided into three types: OLTP, data warehouses and everything else (Hadoop, graph databases). This was likely to mean three or more database management and best practices models were required.

Best practices operate at different levels within the sphere of database management. There are technology best practices which deal with specific tasks for deployment of databases onto servers or into the cloud; and management best practices which relate to higher level functions and overall processes. In addition there are best practices which are defined by software vendors for their own products.  As technology and management change, in the world market, and more is understood about certain areas, best practices change. Thus best practices are replaced with new best practices. The large collection of best practices created are likely to be defined and owned by a multitude of people. This can cause problems with conflicting best practices. Sometimes there is a mismatch between best practices and a compromise needs to be found where possible.
Best practices are intended to be useful for technical solutions to help people provide the required results. They aim to provide a useful guide on what management need to do to perform certain tasks. Best practices are sometimes adapted from vendor or industry defined best practices for nonstandard configurations or different business scenarios. However, sometimes communication is lacking between the management requirements, the vendors’ practices and the technology tasks. Different teams may each create best practice, in places where the technology overlaps, which are not shared. There are therefore limitations to the usage of best practices. The best practices presented are significantly different for ILTM, CMM and ILTIL. There are many different types of tasks from in depth technical ones to higher level models that combined can produce a well-managed database system. Each task, model or part of the database system will have its own best practice, which aims to achieve those reliable results. These best practices at different levels may, in practice, sometimes be in conflict. This discussion on best practice has shown there are many diverse views on the usability and definition of best practice. The working definition in my research (Holt, 2017) for best practice was: a recommended practice for carrying out actions for desirable outcomes, rather than always being the best way of doing something. The research best practice findings are in Holt et al. (2015) and the working cogs of best practice summaries the findings. 

American Productivity and Quality Centre. (1999). What is benchmarking. Retrieved from www.Apqc.org

Avgerou, C. (2011). Discources on innovation and development in information systems in developing countried research. In R. D. Galliers & W. L. Currie (Eds.), The Oxford Handbook of Management Information Systems (p. 650). Oxford: Oxford University Press.
Avgerou, C., & Land, F. (1992). Examining the appropriateness of information technology. In S. Odedra & M. Bhatnagar (Eds.), Social Implications of computers in developing countries (pp. 26–42). New Delhi: Tata McGraw-Hill.
Becker, M. C. (2004). Organizational routines: a review of the literature. Industrial and Corporate Change, 13(4), 643–678. https://doi.org/10.1093/icc/dth026
Bretschneider, S. (2004). “Best Practices” Research: A Methodological Guide for the Perplexed. Journal of Public Administration Research and Theory, 15(2), 307–323. https://doi.org/10.1093/jopart/mui017
Dani, S., Harding, J. a, Case, K., Young, R. I. M., Cochrane, S., Gao, J., & Baxter, D. (2006). A methodology for best practice knowledge management. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 220(10), 1717–1728. https://doi.org/10.1243/09544054JEM651
Dembowski, F. L. (2013). The Roles of Benchmarking , Best Practices & Innovation in Organizational Effectiveness. International Journal of Organizational Innovation, 5(3), 6–20.
Erica Wagner, & Newell, S. (2011). Changing the story surrounding enterprise systems to improve our understanding of what makes erp work in organizations. In R. D. Galliers & W. L. Currie (Eds.), The Oxford Handbook of Management Information Systems (p. 401). Oxford: Oxford University Press.
Falconer, J. (2010). “Best Practice” as Worst Practice : Broken Metaphor , Nude Emperor. Proceedings of the European Conference on Intellectual Capital, 754–762.
Falconer, J. (2011). Knowledge as Cheating : A Metaphorical Analysis of the Concept of “Best Practice.” Systems Research and Behavioral Science, 180, 170–181. https://doi.org/10.1002/sres
Gonnering, R. S. (2011). The Seductive Allure Of “Best Practices”: Improved Outcome Is A Delicate Dance Between Structure And Process. E-CO, 13(4), 94–101.
Gratton, L., & Ghoshal, S. (2005). Beyond Best Practice. MITSloan Management Review, 46(3).
Holt, V. et al. (2015) ‘The usage of best practices and procedures in the database community’, Information Systems, 49. doi: 10.1016/j.is.2014.12.004.
Holt, V. (2017) A Study into Best Practices and Procedures used in the Management of Database Systems. The Open University. Available at: http://oro.open.ac.uk/id/eprint/50950.
Jarrar, Y. F., & Zairi, M. (2000). Best practice transfer for future competitiveness: A study of best practices. Total Quality Management, 11(4–6), 734–740. https://doi.org/10.1080/09544120050008147
Kurtz, C. F., & Snowden, D. J. (2003). The new dynamics of strategy : Sense-making in a complex and complicated world. IBM Systems Journal, 42(3). https://doi.org/10.1147/sj.423.0462
Markus, M. L. (2011). Historical Reflections on the Practice of Information Management and Implications for the field of MIS. In R. D. Galliers & W. L. Currie (Eds.), The Oxford Handbook of Management Information Systems (pp. 3–15). Oxford: Oxford University Press. https://doi.org/http://dx.doi.org/10.1093/oxfordhb/9780199580583.003.0002
McGregor, M. (2007). When Best Practice is Just Not Good Enough Why and How You Need to be Better than the Best. BPTrends, (July), 1–2.
Nattermann, P. M. (2000). Best practice does not equal best strategy. The McKinsey Quarterly, 2.
Sanwal, A. (2008). The Myth of Best Practices. Journal of Corporate Accounting & Finance, 19(5), 51–60. https://doi.org/10.1002/jcaf
Savage, N. (2014). The Power of Memory. Communications of the ACM, 57(9), 15–17. https://doi.org/10.1145/2641229
Snowden, D. J., & Boone, M. E. (2007). A Leader’s Framework for Decision Making. Harvard Business Review, 85(11), 68–76.
Tucker, A. L., Nembhard, I. M., & Edmondson, A. C. (2007). Implementing New Practices: An Empirical Study of Organizational Learning in Hospital Intensive Care Units. Management Science, 53(6), 894–907. https://doi.org/10.1287/mnsc.1060.0692
Wellstein, B., & Kieser,  a. (2011). Trading “best practices”--a good practice? Industrial and Corporate Change, 20(3), 683–719. https://doi.org/10.1093/icc/dtr011

Friday 20 July 2018

Inspire 2018 - Intelligent Cloud and Edge

Inspire 2018 is in Las Vegas, Nevada, 15-19 July. This is the Microsoft premier event for partners. It has been interesting to see the discussions on Twitter during Inspire. These three quotes sum up a possible future state.

Without data, AI doesn't work. Data is the fuel for any AI service @TimNilmaa
It is our collective objective to democratize AI in every industry @Microsoft
The era of the intelligent edge has three defining characteristics AI, ubiquitous computing and people centred experiences @Microsoft

These three areas tie in with the Satya Nadella book 'Hit Refresh' that I have been reading. It is a truly extraordinary book that inspires you to innovate and use that innovation for a better future.

I found in my own research, people centered experiences, drive much of what, where and how things are achieved. It is these small differences in inputs,  that have the possibility to change the outputs in such variable ways. This leads to an expanding complex environment. But what if you could understand that complexity and adjust the tasks that need to be carried out in diverse cross disciplinary teams by building in AI and using a CODEX (Control Of Data EXpediently) model, when managing data and database systems. The result would stretch the breadth and depth, from on premises to cloud delivery, incorporating current and future states. A fully autonomous aid to managing data and database systems using the CODEX that is described my research should help gain that intelligent edge.

Wednesday 11 July 2018

Microsoft Research Open Data Sets

Microsoft Research Outreach team worked with the community to enable adoption of cloud based research. As a result they have launched  Microsoft Research Open Data,  a new data repository for the global research community. Microsoft wish to bring processing to the data rather than rely on data movement through the internet. This useful addition allows the data sets to be copied directly to the Azure based Data Science virtual machines. More details can be read here. The aim is to provide anonymized curated and meaningful datasets that are findable, accessible, interoperable and reusable. This follows on from the data-intensive science fourth paradigm of discovery discussed by Jim Gray.  

The open data set categories can be seen below. 

Saturday 7 July 2018

The Future State - Serendipitous Data Management

Gone are the days where companies can survive on existing products and services. The need to continually innovate to stay ahead in a fluid world, requires a change in direction. Many articles have been written, in both academic research and Industry, to try to predict what will be the future state of data technology and what will be this year's trends.

Currently research meets industry in a rebirth of industry-based research teams consisting of organisational only teams or industry collaborating with universities. Guzdial shares his thoughts in the Communicationsof ACM March 2018 journal that "for the majority of new computer science PhD's, the research environment in industry is currently more attractive". Particularly the need within industry to continually innovate cries out for more research divisions in industry. Part of this change is due to the rapid expansion of emerging technology but also the realization, of what data science and artificial intelligence (AI) can add to a business. Data science requires collaboration between people, teams and organisations as interdisciplinary skills are needed to solve today’s problems.

There is an emerging trend whereby more research institutes have been created or existing ones hiring more staff. Microsoft have created a new organization, Microsoft Research AI (MSR AI), to pursue game-changing advances in artificial intelligence. The research team combines advances in machine learning with innovations in language and dialog, human computer interaction, and computer vision to solve some of the toughest challenges in AI.

AI machine learning intelligence, based on big data, is a complex problem to solve, to empower people for the future. In the current world there is the need for collaboration. Greengard in the Communications of ACM March 2018 journal, raised a concern that "mountains of data produce incremental gains, and coordinating all the research groups and silos is a complex endeavour".  Managing data is complex and the key areas that I think will define the next revolution are in the graph.

Telling stories from the data is increasingly important in this ever-changing holistic environment. Skills need to be developed in this area as communicating the meaning of data is crucial. Aiming for improvement in business, science, robotics, space and health can initially appear through intelligent automation and can produce further actionable insights. 

Data visualization is a key component to telling the story and seeing anomalies. Parameswaran discussed at SIGMOD 2018, that it is the scale that brings databases and visualisation together. He highlighted two problem areas, too many tuples and too many visualisations. It is an interesting point to consider how to address the excessive data points and how to appropriately find the right visualization for the data, to gain insight at speed.  

Innovation is key to the next step. I believe that is by making beneficial discoveries by design through scientific experiments from quality data in a continuous and autonomous fashion. I call this Serendipitous Data Management. This improvement and innovation will come from having sound practices for big data management that enable actionable data insights at speed.

Another trend I am seeing in research and industry is looking at how data is processed in centralised data lakes and moving that processing to the edge, particularly for IOT at the moment. As well as this increasing security, if the data can remain at source, it also reduces the volume of data transit which is currently unsustainable. How to consolidate these distributed data sources and produce analysis across disparate systems is an interesting challenge to solve. In conclusion the system built on data creates a rapidly changing landscape of which I see as the key components in defining revolutionary changes to society and culture. 

Thursday 5 July 2018

2018 MVP Reactions

Peter Laker published an article about reactions from the 2018's NEWEST Most Valuable Professional (MVP) award winners. What a great set of reactions from some amazing people.

Privileged to have my comment listed on the reactions list.

Sunday 1 July 2018

Tutorial on Tree Based Modeling

I found a useful tutorial on tree based learning

The tutorial includes

  • What is a Decision Tree? How does it work?
  • Regression Trees vs Classification Trees
  • How does a tree decide where to split?
  • What are the key parameters of model building and how can we avoid over-fitting in decision trees?
  • Are tree based models better than linear models?
  • Working with Decision Trees in R and Python
  • What are the ensemble methods of trees based model?
  • What is Bagging? How does it work?
  • What is Random Forest ? How does it work?
  • What is Boosting ? How does it work?
  • Which is more powerful: GBM or Xgboost?
  • Working with GBM in R and Python
  • Working with Xgboost in R and Python
  • Where to Practice ?