Call Us: 972-74-767-94-67

BigData

Apache Hadoop is free open Source software for massive distributed computation and storage capabilities.

Hadoop can store Peta Bytes of data and process it very fast.

Hadoop does it using cluster of many commodity servers (nodes) where each data node store portion of the data and used as compute node to process its local data. Hadoop echo systems use Map Reduce model to be able to compute huge amount of data in great speed. The Map do the filtering and sorting in local node and reduce perform the summary operation on all nodes output .

Hadoop storage layer is based on HDFS – Hadoop Distributed file system. Where Hadoop split files into large blocks and distribute them between the cluster data nodes.
HDFS storage have following advantages:

  1. Storage is shared between all data nodes and HDFS clients.
  2. high availability – during writes data blocks are replicated to other nodes as well so in case of failure the data is still available for read and writes on other nodes.
  3. Scalable – Scaling out by adding more data nodes storage and processing capacity grow accordingly.
  4. Performance more data nodes more IO throughput the cluster produce.

Hadoop ecosystem can be extended  with sub modules , software packages that can be installed on top or alone side Hadoop , such as Apache Hive , Apache HBase , Apache Spark , Impala and so On .

Hadoop comes today in 2 main distributions which can be free or licensed:

  1. Cloudera – Distribution Including Apache Hadoop (CDH)
  2. Hortonworks Data Platform – HDP

 

Hortonworks and Cloudera have been merged in 2019.

Use Case:
Hadoop is great solution for on premise Data warehouse or data lakes where data is stored in batches and fast analytics on huge data is required + the free license.

 

Our Services:

  • Big Data Architecture
  • Hadoop DevOps: Cluster Installation, Performance Tuning, Upgrades, backup and recovery
  • Data Engineering using Python and PySpark.
  • ETL and Analytics Development using Impala and Spark
  • 24*7 Support

Elasticsearch is free open source search engine based on Apache Lucene with many goodies such as aggregation, analytics, ETL using Logstash and Kibana dashboard which make it today the best search engine in world with many capabilities to become centralized database for many products and companies.

Elasticsearch can store Peta bytes of data on many servers (nodes) and being searched and analyzed in very high speed.

Elasticsearch is great database for unstructured data, as data is being stored in Json format enabling auto schema detection plus adding fields on the fly with no schema modification.

By default, every field is being indexes and can be searched with best response time.

No More need to Relational Database RDBMS with tables and joins, data can be stored as application Json object, inserted and queried much faster in json format, protocol the application can understand .

Elasticsearch support scale out by adding nodes and high availability using replication between the nodes.

Use Case:
Elasticsearch is great database to store real time structured and unstructured data with very fast queries and free text search.

 

Our Services:

  • Big Data Architecture
  • Elasticsearch DevOps – Cluster Installation, performance tuning, backup and recovery.
  • Data Engineering Using Python and PySpark
  • Application development using Python ad Node JS.
  • 24*7 Support

Apache Cassandra is free open source distributed wide column store initially developed by Facebook.

Cassandra is master less – no need to define master node, every node writes and read its local shard data bringing best high availability and speed .

Cassandra belong to the NoSQL family but has CQL (SQL Like ) Interface which enable you to create tables , primary keys , cluster indexes and write “SQL” to insert and select data .Cassandra Distribute the data between the nodes according to the primary key values thus load balance the read /writes requests between the nodes evenly .

The fact that Cassandra have cluster indexes enable the application to retrieve data very fast by key value but also to make a very fast drill down to the cluster index columns which is not avaiable in other NoSQL databases.

Cassandra support scale out by adding nodes and high availability using replication between the nodes.

Cassandra can support 1 million Operation per second and very heavy read write operations in real time.

Use Cases

Application types: Social networks, Time series, IOT, session management.

Our Services:

  • Big Data Architecture and Cassandra data modeling
  • Cassandra DevOps – Cluster Installation, performance tuning, backup and recovery.
  • Data Engineering Using Python and PySpark
  • Application development using Python ad Node JS.
  • 24*7 Support
Call Now Button Skip to content