Apache Hadoop Archives - GS Tech Blog

Apache Ranger Audit Logs stored in HDFS parsed with Apache Spark

by GS
in Apache, Apache Hadoop, Apache Hive, Apache Ranger, Apache Spark, audit, Hadoop, HDFS, Pyspark
on August 31, 2018

0

Using Apache Spark to parse a large HDFS archive of Ranger Audit logs using Apache Spark to find and verify if a user attempted to access files in HDFS, Hive or HBase. This eliminates the need to use a Hive SerDe to read these Apache Ranger JSON Files and to have to create an external…
Read more

Hadoop, Java and HTTPD and /etc/security/limits.d/ nproc/pid-max

by GS
in Apache, Hadoop, java, limits.d, Linux, nproc, pid, tid, Tuning
on March 1, 2016

0

After successfully running a Large Hadoop Cluster for a period of time.Â I started to notice strange things occurring initially with the MapReduce PI example task where tasks would be marked as failed. When looking more closely and attempting to logon/su/ssh to a machine with the userid that was running the job the sshd/suÂ would return: -bash:…
Read more

Hadoop and ip_conntrack: table full, dropping packet

by GS
in Hadoop, iptables, kernel tuning, Linux, tcp, Tuning
on February 29, 2016

0

I’m pretty sure many folks have seen this specific error across multiple different linux systems specifically when iptables is enabled and the OS has thousands of connections coming in second. In my case I ran into this Examples of this are with Hadoop NameNode. Someone accidentally executed iptables -L to try to get a list…
Read more

Hadoop and Redhat System Tuning /etc/sysctl.conf

by GS
in Apache, Apache Solr, Flume, Hadoop, HDF, Hortonworks, kernel tuning, limits.d, Linux, nproc, pid, sysctl.conf, tcp, tid, Tuning
on February 28, 2016

0

Hadoop and Redhat System Tuning /etc/sysctl.conf One of the most overlooked things after building out a Hadoop cluster is the operating system tuning. This post will cover how to tune settings in /etc/sysctl.conf also known as Linux Kernel Settings. /etc/sysctl.conf ## ALWAYS INCREASE KERNEL SEMAPHORES especially IF using IBM JDK with SharedClassCache also a separate…
Read more

Integrating Apache Hadoop and Apache Flume with IBM MQ

by GS
in Apache, Flume, Hadoop, IBM, java, Linux, Messaging, MQ
on May 30, 2015

0

Integrating Apache Hadoop and Flume with IBM MQ Over the past 2 years of working with Apache Hadoop a few things have come up folks wanting to use Apache Kafka which definitely has it’s place in the Hadoop Big Data and Next Generation of Technology spheres. But there is also the need to integrate with…
Read more

Why now?

by GS
in Apache, gempak, General, grads, Hadoop, IBM, Linux, tcp, Tuning, weather, Websphere
on January 15, 2014

0

GS Tech Blog What is the GS Tech Blog! It’s a place for me to rant and provide my thoughts about technology I’ve worked with over many years. So after working as a Technology Systems Engineer for almost 20 years, I decided it’s time to create a blog to publish some of my ideas and…
Read more

Tag: Apache Hadoop

Apache Ranger Audit Logs stored in HDFS parsed with Apache Spark

Hadoop, Java and HTTPD and /etc/security/limits.d/ nproc/pid-max

Hadoop and ip_conntrack: table full, dropping packet

Hadoop and Redhat System Tuning /etc/sysctl.conf

Integrating Apache Hadoop and Apache Flume with IBM MQ

Why now?

Links

Tag: Apache Hadoop

Apache Ranger Audit Logs stored in HDFS parsed with Apache Spark

Hadoop, Java and HTTPD and /etc/security/limits.d/ nproc/pid-max

Hadoop and ip_conntrack: table full, dropping packet

Hadoop and Redhat System Tuning /etc/sysctl.conf

Integrating Apache Hadoop and Apache Flume with IBM MQ

Why now?

Links

Categories