This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.
Choose one of the following:
- “Oh no! My data estate is in chaos. We can’t keep track of anything, and new projects get spawned ‘wherever’, leaving us with no governance, poor collaboration and frustrated data teams.”
- “Oh no! My data estate is too rigid. Our data engineering team insists on following out-of-date ‘best practice’, they have to, because of decisions we made in the past. This is leading to long lead times, expensive projects and frustrated data teams.”
If neither of these apply to you, then click away now. This article isn’t for you.
Chances are you are still with me.
The reason for both situations outlined above is a poor chaordic balance in your data system. Identifying this poor balance is the first step to being able to do something about it. By the end of this article you will know how to identify where your balance lies and what you can do to correct it.
What does Chaordic mean?
This rather handy portmanteau was coined by Dee Hock to describe organisations that are neither rigidly controlled nor anarchic. It is a combination of the words Chaos and Order, representing these two sides. An organisation should aim to find a balance between the two extremes.
Chaos and Order in data systems.
When we look at a data system, we can consider the chaordic balance too. On the chaotic side we can consider a filing system with no enforced structure to be an anarchic data system. We have all worked in this way at some point. The reason we have is that it is ultimately flexible, it can store anything and is quick to change and adapt. Queries are complex because there is no formal definition of what is in the data and there is little you can do in the way of quality control and governance.
On the ordered side, we can consider an RDBMS that has a strong design-up-front requirement. This system is rigid and controlled. Great governance, simpler queries against formal definitions, but also rigidity and coupling that makes it difficult to change.
What underlies this is a spectrum relating to data schemata. When your data system accepts data in the schema of the generating system without changes, then you are at the store-anything chaos end of the spectrum. When your data system requires the data it accepts to be in a predefined schema, then you are at the tightly controlled order end of the spectrum. Between these two ends you have a range of technologies with different requirements. For example, a data lake (such as Azure Data Lake Storage) is a few steps more ordered than a filing system but can hold data of ‘any schema’. NOSQL databases (such as Cosmos DB), use implicit schemas: It requires particular formats of data (JSON-like documents or node and edge graphs, for example) but the schema is never designed up front. While that may sound ideal, you need to enforce some kind of control over the schema of the data stored to get good application performance on query (as opposed to search) . The almighty SQL Server offers you capabilities at many points on this spectrum.
Note: in the above I use NOSQL as opposed to NoSQL. The difference being between ‘Not Only SQL’ and ‘No SQL’. The world of databases that aren’t an RDBMS has evolved significantly since the original ‘No SQL’ term was defined. In the spirit of this article, finding a balance is what is important.
When the balance is off.
If you do not have the right chaordic balance in your data estate, data stored will be in conflict with the query requirements. For example, visualising data in PowerBI requires the data you wish to visualise to be in a well-designed schema. Yes, there are tools to prepare the data, but the more time you spend doing this the less time you are spending creating great visualisations. The same applies to data science and machine learning. The now legendary adage goes: “80% of the time spent in a data science project is data preparation”. This is both because your chaordic balance is off but also because data is only in the right ‘shape’ for an analysis after that analysis is done. If you meet a data set that is perfect for your needs, it means the analysis you are looking to do has been done before!
A modern data estate needs a modern chaordic balance that enables you to move with agility and adapt but can also be governed and secured to suit your business needs. Azure data services offer you the tools you need to strike this balance.
Getting the balance right.
As described so far, when the balance is off you will see less agility, higher costs and more time spent than you would expect in data work. Knowing when the balance is right is much harder because it is unlikely anyone will comment on it. “Success is praise in itself, let’s focus on what’s going wrong” is not an uncommon attitude. Good balance means projects will start faster and iterate quicker than they have done in the past. New questions will be answered from data, not with complaints about the system.
Azure data services offer capability right across the data schema spectrum. Mentioned above, filing systems, Data Lakes, NOSQL Databases and Relational Databases of several different flavours are all available. Tools such as Data Catalog and Data Factory help you govern and move your data to different points on this spectrum. Azure Cognitive Search is there for when search is the right choice over query. One of the key capabilities that unlocks agility is the separation of data and compute – but that is a topic for another day.
There isn’t a one-size-fits-all point on the chaordic balance. Having a future proofed modern data estate means having the agility to adjust as your business evolves.
Your chaordic journey.
As your business evolves, your needs evolve, and this leads to your data estate needing to evolve too. If it doesn’t, your chaordic balance will slowly tilt in the wrong direction, leading to frustration. One way to help redress the balance is by migrating and modernising to a more agile data estate in the cloud. This will allow you to more easily evolve your capability as your business grows and changes.
Migrating your data estate to the cloud shouldn’t be a lift-and-shift exercise.
Modernise your approach to data by embracing the chaordic balance. The services I link to above, along with Azure Synapse Analytics, are a great place to start.