What is hierarchical data?
Hierarchical data is a data structure when items are linked to each other in parent-child relationships in an overall tree structure. Think of data like a family tree, with grandparents, parents, children, and grandchildren forming a hierarchy of connected data. Typically, this is used to show an organizational chart, a project with tasks, or a taxonomy of language terms.
In hierarchical data, each of these “children” nodes has only one “parent,” but each parent can have multiple children. The first node, at the top of the hierarchy, is called the root node. When information needs to be retrieved, the whole tree is scanned from the root node down. Since the whole tree needs to be scanned each time a user makes an inquiry, it makes for an inflexible, slow system. Modern databases have evolved to include the usage of multiple hierarchies over the same data for faster, easier searching.
However, hierarchical data is still very widely used today. A common use of a hierarchical data structure is staffing information. Under an organizational chart structure, you have the CEO root node at the top with a staffing structure below.
Developed by IBM in the 1960’s, the hierarchical data model is one of the earliest types of models. However, it quickly was replaced with a relational data model to overcome some of the significant structural problems inherent with this model.
Why is data structure important?
The structure of an organization’s data is vitally important. For a business to be able to input information, process it, retrieve it, and maintain it, choosing the data structure type is absolutely vital. Imagine on your computer, if there was no folder structure or naming conventions. The ability to find or use anything would be severely compromised.
Instead, in a business, you’ll find folders arranged with logical department names. In those departments, folders could be arranged by financial quarters or another logical way that applies to that business. Then, within that, the individual files are named descriptively, often with dates or versions recorded in a specific format. These are all stored in alpha order. This means when an employee needs to access a certain file, it is an incredibly simple task to drill down and find what they need. In a similar way, a good data structure enables good, fast, efficient business practices.
A huge limitation of data science in general is that although the world is fluid and 3D, computers perceive it as flat and unchangeable. While people know that there are 200 dog breeds, and 200 snake varieties, a computer just knows there are 400 items. A data structure is vital for grouping, accessing, processing, and seeing data in a way that’s interpretable by computers.
When should an organization use hierarchical data?
Currently, organizations such as banks and telecommunications use hierarchical data in their applications. This is because they require fast and highly accurate performance. The Windows registry on most computers is also an example of a hierarchical data structure. Complex manufacturing projects also often use hierarchical data models due to large volumes of data.
Hierarchical data is best used when:
- The data can be stored in a “tree” form with a clear parent and child structure
- There is a need to capture the structure of the hierarchy
- There are high data volume requirements
- Existing systems are too complex to migrate to a relational or network model
Alternatives to hierarchical data
Relational data
The most common data model, this is when data is grouped into relations. A collection of variables are grouped together in true, false, or null categories. Then, data items are put together with other items that share variables and attributes. Groups are linked together by relations. This group of items are similar to that group. Unlike hierarchical data that is confined by its one-to many nature, there can be many to many relationships in relational data. Relational databases use Structured Query Language (SQL), the standard query language.
Relational model is excellent for maintaining data consistency across instances. For example, if a customer withdraws money from an ATM, that change will be reflected in the balance on the phone app. This makes it excellent for ensuring multiple instances of data are correct across a range of systems.
As in comparison to a hierarchical structure, rather than storing data in a tree structure, it stores in tables.
Network model
The network model, like the relational data model, was created to resolve the shortcomings inherent with hierarchical database models. In a network data model, a child can be linked to multiple parents, albeit they are called owners and members.
This model supports a range of relationships, and is far more fluid than hierarchical models. However, they are much more complicated, making them difficult to manage and maintain. It is still not as flexible as relational data models, and not all relations can be accurately modelled and linked to owners and members.
Advantages of hierarchical data structures
Data is easily retrieved
Because the links between data nodes are so well defined, finding and retrieving data is easy. Because the parent and children are stored closely together, navigation and data retrieval is fast too.
Referential integrity
The integrity of the data is always maintained because all changes made in the parent table are automatically changed in the child table.
Simple structure
The upside-down, parent-child relationship structure is immediately and easily understood. It is conceptually simple, has a clear chain of command within the database, and as a result it offers high performance. Because of the simple structure, it also promotes data sharing.
Good security
Database security is provided and enforced by the database management system. It is so secure, that some structured query language developers say you’d have to be a magician to get the data out.
Challenges of hierarchical data structures
Inflexibility
The model is inflexible and rigid, and adding a node or relationship can disrupt the entire structure. If a parent and child table are unrelated, adding a new entry in the child table is difficult because extra data must be entered in the parent table first. Similarly, it is challenging to move a child from one level to another.
Only suitable for one to many relationships
When one parent has many children, those relationships are easy to show. However, when there are many-to-many relationships, these are more difficult to express. Anything more complex than parent to child relationships are not supported in hierarchical data. Children cannot be linked to other parents, and there is no ability to link children from different parents.
Deletions
If a parent is deleted, all the children (and beyond) are automatically deleted too. Imagine in your desktop computer, if you delete a folder, you delete all the files within it. This is similar to how a hierarchical database operates for deletions.
Lack of standards
There is no specific data definition or data manipulation language. In general, the system relies on the rigidness to enforce standards, and this does not always work.
Complex to implement
Implementation of hierarchical data systems requires understanding of data storage characteristics, and knowledge of the organizational structure. This makes it complicated and difficult to implement compared to other systems.
The future of hierarchical data models
While some models still exist today that are hierarchical in nature, they are falling out of favor. It would be unusual to implement a hierarchical data model in a new system. Their rigidity, difficulty in implementation, and huge restrictions make them uneconomical and clunky to implement.
While they can be efficient in terms of accessing the data, their restrictive nature makes them less able to cope with the challenges of the overwhelming variety of data types and volumes.
As one of the first data models, it could not have possibly been foreseen that the swift nature of technology would render it obsolete; but it is happening. The future is in flexibility, and the very characteristic that hierarchical data cannot give. Networks and ecosystems are fast replacing hierarchies with a more organic way of storing and accessing data.