|
|
|
There are several reasons why Hadoop is not always the best solution for
all purposes. Let's discuss ten disadvantages of Hadoop here.
|
|
|
|
|
|
Tuesday, November 11, 2014:
Hadoop has become the backbone of several applications and Big Data
cannot be even imagined without Hadoop. Hadoop offers distributed
storage, scalability and huge performance. It's also considered as the
standard platform for high-volume data infrastructures. But there are
several reasons why Hadoop is not always the best solution for all
purposes. Let's discuss ten disadvantages of Hadoop here:
|
|
1. Pig vs. Hive:
Hive UDFs are not allowed to be
used in Pig. Hcatalog is required to access Hive tables in Pig. Pig
UDFs cannot be used in Hive too. If any extra functionality is required
in Hive, then a Pig script is always not much preferred.
2. Security concerns:
If
Hadoop is used to manage a complex application, then it becomes a huge
challenge. Hadoop's security model is not very recommended one and if
used in complex applications, it gets disabled by default. Data is at
huge risk as encryption is missing in Hadoop system at the storage and
network levels. Without encryption, data can always be compromised
easily.
3. Big Data cravings:
Hadoop is mostly
craved when business is built on a Big data dataset. But before using
Hadoop, you need to know answers to certain questions like how much
terrabyte of data do you have, if you are having a steady and huge flow
of data or not and how much data will be operated upon in reality.
4. Shared libraries forcefully stored in HDFS:
Hadoop
keeps repeating this issue. If Pig script is stored in HDFS, then it's
assumed that there will be JAR files too. This theme recurs in Oozie and
other tools too. Well, storing shared libraries in HDFS is not that
much a bad idea, but if it's to be done across a huge organisation, then
the task is painful.
5. Vulnerable by nature:
Hadoop
is always risky when it comes to security concerns. The framework of
Hadoop is written in Java, the programming language known for its
popularity to be the most vulnerable one among cyber criminals. It means
Hadoop is quite vulnerable to data breaches automatically.
6. Oozie:
Debugging
is not a funny job. If there is an error, it doesn't always mean you
have done something wrong. It can also be a protocol error which arrives
in case of a configuration typo or a schema validation error. These
kinds of errors fail on the server. In these cases, Oozie is always not
of much help, if not distributed properly.
7. Unsuitable for small data:
Big
Data doesn't always mean big businesses. Big Data platforms are also
not suited for small data needs always. Hadoop is one such platform
which is not at all compatible with small data. It has high capacity
design and its Hadoop Distributed File System or HDFS cannot read small
files randomly. Hence, Hadoop is not the best solution for
organisations which deal with small amount of data.
8. Stability issues:
Hadoop,
being an open source platform, means it has been developed by several
contributors, who are still working on the project. There are always
some new improvements, like any other open source software. Hadoop has
its stability issues to a huge extent. Organisations are advised to run
the latest versions of Hadoop to avoid these kind of stability issues.
9. Documentation:
Documentation
of Hadoop system is not very refined as there are several errors in the
same. Shared examples are not always checked, which lead to mistakes.
The most formidable part is the documentation for Oozie, as its examples
don't even pass the schema validation.
10. Repository management:
If
you have done any installation from the Hadoop repositories then you
must know that the repositories don't act properly all the time, as they
are mismanaged. It doesn't even check compatibility all the time while
installing any new application.
|
|
0 comments:
Post a Comment