How to categorise data ? Is all data to be held into a relational database, or is there other ways of storing it ?
follow the guide ...
There are three types of data:
Structured data
All data organised and categorised into a formal framework
is considered to be Structured Data. The best example is relational databases
(RDBMS). This is how data was stored and used for the last 40 years or so.
As examples, there is the Oracle database, Microsoft SQL
Server, DB2, MySQL, PostgresSQL, and so on.
Data is stored, normalised and has defined relationships for
storage and access.
Semi-Structured data
Relational databases were covering much of the market needs
in storing data. However, in some particular cases, a rigid data structure
model is not the answer.
Let’s say we want to store Computacenter’s directory
information. At first it sounds easy, a table like the following will suffice.
However, personal information is never so clear cut. We all
have multiple phone numbers, e-mail addresses or even surnames. Which means the table needs to be heavily
normalised.
The above picture was extracted from the following article : Social Network Database Design Sample - MySQL
Again, issues will surface soon enough when we’ll try to
- Include data unique to certain employees. For example the rate of a consultant working on customer facing engagements, but this information is not required for an accountant working in an administrative role.
- Include a hierarchy.-
- If we were to store all the data required by all types of employees we’ll end up with hundreds of columns, all of them normalised, which is hardly manageable and we’re hitting the RDBMS limits.
It was clear that a different way of storing Directory
information was required, and the answer to that was the X500 protocol that
later became LDAP, the Lightweight
Directory Access Protocol.
LDAP defines data storage in a hierarchal tree using
attributes/keys pairs for the information.
Other examples of semi-structured data are XML and HTML.
Semi-structured data is a form of structured data that does not conform with the formal structure of tables and data models associated with relational databases but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as schema-less or self-describing structure. <Wikipedia>
However, LDAP was created
specifically to manage directoriesand is widely implemented today, with LDAP servers like Sun One, Active Directory, OpenLDAP.
The XML mark-up language
later offered more flexibility in storing semi-structured data readable by both
humans and machines using a hierarchy and TAGS to define the attributes.
Unstructured
data
And finally, unstructured data is everything else. The term unstructured data refers to any data
that has no identifiable structure. For example, images, videos, email,
documents and text.
A good example of unstructured data is the Internet.
No comments:
Post a Comment