Data Reign: Categorise Data

How to categorise data ? Is all data to be held into a relational database, or is there other ways of storing it ?
follow the guide ...

There are three types of data:

Structured data

All data organised and categorised into a formal framework is considered to be Structured Data. The best example is relational databases (RDBMS). This is how data was stored and used for the last 40 years or so.

As examples, there is the Oracle database, Microsoft SQL Server, DB2, MySQL, PostgresSQL, and so on.

Data is stored, normalised and has defined relationships for storage and access.

Semi-Structured data

Relational databases were covering much of the market needs in storing data. However, in some particular cases, a rigid data structure model is not the answer.

Let’s say we want to store Computacenter’s directory information. At first it sounds easy, a table like the following will suffice.

However, personal information is never so clear cut. We all have multiple phone numbers, e-mail addresses or even surnames. Which means the table needs to be heavily normalised.

The above picture was extracted from the following article : Social Network Database Design Sample - MySQL

Again, issues will surface soon enough when we’ll try to

Include data unique to certain employees. For example the rate of a consultant working on customer facing engagements, but this information is not required for an accountant working in an administrative role.
Include a hierarchy.-
If we were to store all the data required by all types of employees we’ll end up with hundreds of columns, all of them normalised, which is hardly manageable and we’re hitting the RDBMS limits.

It was clear that a different way of storing Directory information was required, and the answer to that was the X500 protocol that later became LDAP, the Lightweight Directory Access Protocol.

LDAP defines data storage in a hierarchal tree using attributes/keys pairs for the information.

Other examples of semi-structured data are XML and HTML.

Semi-structured data is a form of structured data that does not conform with the formal structure of tables and data models associated with relational databases but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as schema-less or self-describing structure. <Wikipedia>

However, LDAP was created specifically to manage directoriesand is widely implemented today, with LDAP servers like Sun One, Active Directory, OpenLDAP.

The XML mark-up language later offered more flexibility in storing semi-structured data readable by both humans and machines using a hierarchy and TAGS to define the attributes.

Unstructured data

And finally, unstructured data is everything else. The term unstructured data refers to any data that has no identifiable structure. For example, images, videos, email, documents and text.

A good example of unstructured data is the Internet.

Data Reign

Pages

description

Wednesday, 18 July 2012

Categorise Data

No comments:

Post a Comment