Hadoop Engineer / Hadoop Administrator (AI/ML group)

The AI/ML group at Verus works on problems at the intersection of Data Extraction, Transformation and Loading (ETL), Artificial Intelligence, Data Mining, Data Analysis, and software performance analysis. We are looking for a Hadoop Engineer and Administrator to manage a Hortonworks Data Platform (HDP) cluster (< 10 nodes) and to architect software solutions to run on the same cluster.

You have experience architecting, building, deploying, and managing highly available databases, servers and storage. Additionally, you have a passion for providing solutions through automation and believe in practicing a DevOps philosophy. You understand that success is through collaboration and communication. Your background likely includes system administration, experience with relational and noSQL databases, and experience working with monitoring tools for those environments. Prior work experience in a regulated industry is desirable.

In this role you will work directly with Research Scientists, Engineers and Data Analysts to optimize the interaction and sharing of responsibilities between humans and machines. You must demonstrate fluency in Python and Java as a key responsibility of this role is to build custom automations and extensions to Open Source technology platforms; and to validate ideas by designing and interpreting experiments, and implementing distributed algorithms.

Responsibilities

  • Configure and administrate Hortonworks Data Platform (HDP) and supporting ecosystem.
  • Develop, implement, and support highly available platforms on which to deploy HDP and supporting ecosystems in a multi-tenant cloud environment.
  • Automate the deployment, configuration, and supportability of HDP.
  • Continually improve and optimize the performance of HDP and supporting ecosystems.
  • Work directly with technical and business resources to devise and recommend solutions based on analysis of requirements.
  • Work alongside scientists, engineers and analysts to bring their ideas to life by implementing algorithms, running experiments and building prototypes.
  • Perform proactive analysis to prevent exposure to known software issues.
  • Analyze complex distributed production deployments, and make recommendations to optimize performance.
  • Upgrade existing HDP cluster to newer versions of the software.
  • Setup new HDP users. This includes setting up Kerberos principals and testing HDFS, Hive, HBase and Yarn access for the new users.
  • Perform HDP security configurations with Ranger, Kerberos, Knox and Atlas.
  • Monitor and optimize cluster utilization, performance and operations.
  • Promote and establish DevOps culture and practices.
  • Write and produce technical documentation, administration runbooks and knowledge base.
  • Keep current with Hadoop Big Data ecosystem technologies.

Expectations

  • Manage individual project priorities, deadlines, and deliverables.
  • Adapt to changes and setbacks in order to manage pressures and meet company requirements.
  • Demonstrate and apply principled software engineering practices.
  • Be versatile and passionate about tackling new problems and learning new technologies.
  • Have a strong sense of ownership and drive.
  • Sharp problem solving skills and ability to tackle new problems effectively while hiding complexity.
  • Automation is in your DNA.
  • Metric driven and focused on continual improvement.
  • Drive problem resolution to root cause and take initiative in identifying and fixing broken processes.
  • Comfortable collaborating and working with distributed teams.
  • Excellent verbal and written communication skills.

Minimum qualifications

  • BA/BS degree in Computer Science, related technical field or equivalent practical experience (3 years of work experience for every 1 year of education).
  • 2+ years of hands-on working experience on Hadoop infrastructure stack (e.g., HDFS, MapReduce, HBase, Flume, Spark, Pig, Hive, Oozie, YARN, Zookeeper, Presto, etc.).
  • 2+ years experience with design patterns commonly used in Hadoop-based deployments.
  • 2+ years of HDP installation and administration experience in multi-tenant production environments.
  • 3+ years experience in Python, Java, SQL and shell scripting
  • 2+ years experience with DevOps practices and tools (e.g. Git, Jenkins, Maven, Ant).
  • Strong experience with Ambari, Ranger, Kerberos, Knox and Atlas.
  • Strong experience with other RDBMS, such as Oracle or MS SQL server.
  • Strong experience with various enterprise security solutions such as LDAP and Active Directory.
  • Strong expertise in leveraging a wide variety of open source technologies.
  • Strong understanding of network configuration, devices, protocols, speeds and optimizations.
  • Good troubleshooting skills, understanding of HDP capacity, bottlenecks, memory utilization, CPU usage, OS, storage, and networks.
  • Understanding of on premise and Cloud network architectures.

Preferred qualifications

  • Advanced degree (MS/ME) in Computer Science or related technical field.
  • 3+ years of hands-on working experience on Hadoop infrastructure stack.
  • 3+ years of HDP installation and administration experience in multi-tenant production environments.
  • 3+ years experience with Enterprise Linux System Administration (CentOS/RHEL).
  • Hortonworks HDP Administration certification or equivalent.
  • Proven ability building Curated Models for BI and Analytics.

Interested in this position?
Please email your resumé to careers@vanalytics.com