Project Description: Target is a Financial & Retails services which has online shopping and store shopping of all household products. Module: Data analytics which will be require on day to day data, weekly, monthly and yearly Responsibilities:
• Involved in all phases of development activities from requirements collection to production support.
• Understanding the current system and find out the different sources of data
• Involved in Cluster setup
• Performed Batch processing of logs from various data sources using Mapreduce
• Predictive analytics (which can monitor inventory levels and ensure product availability)
• Analysis of customers' purchasing behaviors
• Response to value-added services based on clients' profiles and purchasing habits
• Defined UDFs using PIG and Hive in order to capture customer behavior
• Design and implement map reduce jobs to support distributed processing using java, Hive and Apache Pig.
• Create Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.
• Providing pivotal graphs in order to show the trends
• Maintenance of data importing scripts using Hive and Mapreduce jobs
• Develop and maintain several batch jobs to run automatically depending on business requirements
• Import and export data between the environments like MySQL, HDFS and
• Unit testing and Deploying for internal usage monitoring performance of solution Environment: EMR, Hive, PIG, Datameer, HDFS, Quartz, Java Map-Reduce, Maven, Core Java, GIT , Jenkins, Unix, MYSQL, Eclipse, Oozie , Sqoop , Flume
- Sr. Hadoop Developer at Minneapolis
- Hadoop Developer at Ericsson
- CMA/MOM Developer, SCRUM Master at Ericsson
- JCAT Developer at Tata Consultancy Services
11 months at this Job
- Master's - Information Systems
• Responsible for building scalable distributed data solutions using Hadoop.
• Working as Hadoop Developer and admin in Hortonworks (HDP 2242) distribution for 10 clusters ranges from POC to PROD.
• Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files
• Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
• Developed job processing scripts using Oozie workflow.
• Installed Git lab for Code Respository..
• Implemented various types of SSIS Transforms in Packages including Slowly Changing Dimension, lookup, Fuzzy Lookup, Conditional Split, Derived Column, Data Conversion etc.
• Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.
• Generated Tableau Dashboards Implementing Quick/Context filters, Parameters
• Proficient with Tableau Server, Tableau Desktop, Tableau Online.
• Analyzed the behavior of the user by working with HiveQL on logs of Big Data.
• Expert in Strong understanding of dimensional data modeling, Strong SQL optimization capabilities, Metadata Management (Connections, Data Model, VizQL Model)
• Worked with security team, troubleshooting connectivity issues within LDAP & Ranger AD, knox gateway, ODBC/JDBC connectivity issues, kerberos accounts &keytabs.
• Working with data delivery team to setup new Hadoop users, Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users on Horton works & Cloudera Platform
• Skilled in Tableau Desktop for data visualization, Reporting and Analysis; Cross Map, Scatter Plots, Geographic Map, Pie Charts and Bar Charts, Page Trails and Density Chart.
• Created HBase tables to store various data formats of data coming from different sources.
• Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
• Configured Spark streaming to get ongoing information from the Kafka and stored the stream information to HDFS.
• Developed data pipeline using Flume, Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.
• Utilized SparkSQL to extract and process data by parsing using Datasets or RDDs in HiveContext, with transformations and actions (map, flatMap, filter, reduce, reduceByKey).
• Extend the capabilities of DataFrames using User Defined Functions in Python and Scala.
• Resolve missing fields in DataFrame rows using filtering and imputation.
• Integrate visualizations into a Spark application using Databricks and popular visualization libraries (ggplot, matplotlib).
• Faster processing and testing of data is achieved by implementing Spark SQL and Spark using Scala.
• Experienced on adding/installation of new components and removal of them through Ambari.
• Experience with data wrangling and creating workable datasets.
• Monitoring systems and services through Ambari dashboard to make the clusters available for the business
• Implemented data wrangling, cleaning, transforming, merging and reshaping data frames.
• Generated reports in Play framework using Highcharts by consuming REST APIs from other domain.
• Worked in Agile methodology and used iceScrum for Development and tracking the project.
• Worked with HQL and Criteria API from retrieving the data elements from database.
• Hand on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans
• Configured different Notifications on AWS Services.
- Hadoop Developer at StateStreet
- Big Data Developer at Therma thru Doors
- Hadoop Developer at DOE - Albany, State of New York
- Hadoop Developer at Premier Inc
2 years, 1 month at this Job
Framingham, Massachusetts, , USA.
Environment: Hadoop, Pig, Hive, Sparkcore, spark-sql, Scala, MySQL,
Duration: Feb 2018 to till Date
Role: Hadoop Developer
This Project is all about the rehousting of their (staples) current existing project into Hadoop platform. Previously staples was using mysql DB for storing their competitor's retailer's information.[The Crawled web data]. Early staples use to have only 4 competitor retailers namely Amazon.com, walmart.com etc
But as and when the competitor retailers are increasing the data generated out of their web crawling is also increased massively and which cannot be accomodable in a mysql kind of data box with the same reason staples wants to move it Hadoop, where exactly we can handle massive amount of data by means of its cluster nodes and also to satisfy the scaling needs of the staples business operation.
Roles and Responsibilities:
• Moved all crawl data flat files generated from various retailers to HDFS for further processing.
• Developing PIG scripts to process the HDFS data.
• Create Hive tables to store the processed results in a tabular format.
• Using Sqoop scripts in order to interact between Pig, HIVE and MySQL Database.
• Writing the shell scripts for processing data and loading to HDFS
• Writing CLI commands using HDFS.
• Developed the UNIX shell scripts for creating the reports from Hive data.
• Completely involved in the requirement analysis phase.
• Setup Hive with MySQL as a Remote Metastore
• Moved all log/text files generated by various products into HDFS location
• Create External Hive Table on top of parsed data.
• Developing spark programs using Scala API 's to compare the performance of Spark with Hive and sql
• Used Spark API over cloudera Hadoop YARN to perform analytics on data in Hive
• Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
• Used Spark-SQL to load data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL
• The spark processed data, we are exporting (writing) to RDBMS Table (MYSQL) for further Reporting. PROJECT# II:
- Hadoop Developer at capgemini Pvt Ltd
- software Engineer at capgemini Pvt Ltd
- Hadoop Developer at capgemini Pvt Ltd
- SCHOOL MANAGEMENT at capgemini Pvt Ltd
1 year, 7 months at this Job
- B.Tech - CSE
• Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
• Involved in ETL, Data Integration and Migration.
• Hands on experience in COGNOS 10.x/8.x Suite (Framework Manger, Cognos Transformer, COGNOS Connection, Report Studio, Query Studio, Business Insight, Advanced Business Insight, Analysis Studio and Event Studio) with expertise in Metadata Modeling-Create project, Prepare Metadata, Prepare the Business View, Create and Manage packages, Set security and publish into portal.
• Having Work Experience in Cognos Life Cycle Manger while migrating from Cognos 8.4 to Cognos 10.
• Responsible for managing data from multiple source.
• Testing Hadoop Framework using MRUnit.
• Developed the Pig and Hive queries as well as UDF'S to pre-process the data for analysis.
• Importing and exporting data into HDFS and Hive using Flume.
• Developer of data quality monitoring and systems software in Python with Flask, coding in Python working on news content systems and infrastructure.
• Cluster co-ordination services through ZooKeeper.
• Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
• Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
• Applied Hive quires to perform data analysis on HBase using Storage meet the business requirements.
• Created components like Hive UDFs for missing functionality in HIVE for analytics.
• Hands on experience with NoSQL databases like HBase and Cassandra and Amazon Web Services.
• Used different file formats like Text files, Sequence Files, Avro etc.
• Installed and configured Hadoop, Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing. Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, Flume, HBase, Oozie, Cassandra, Java, Zookeeper, My SQL, Cognos.
- Hadoop Developer at Lockheed Martin
- Hadoop Developer at Bank of America
- Hadoop developer at Johnson Controls
- Java & J2EE developer at Lahey Health
9 months at this Job
• Develop best practices for developing and deploying Hadoop applications and assist the team to manage compliance to the standards.
• Collected the data from ftp server and loaded into Hive managed and external tables.
• Worked with SQOOP import and export functionalities to handle large data sets transfer between SQL Server database and S3.
• Created a custom UDF in hive to mask the customer PII data.
• Worked on ETL data Cleansing, Integration, and Transformation using Spark and Hive.
• Exported the transformed data to relational databases using Sqoop for visualization and to generate reports for the research and development team.
• Used Shell scripting for automation of scripts.
• Developed an automation process to do reconciliation between Sales Dag's and Input Source MSMQ feed from Corp Server (SQL Server) to find the missing transactions.
• Did Changes to the Spark and Hive Scripts to find the sale transactions with promotions and did aggregation at the unit's level to list the units sold with respect to promotions for the stake holders reports.
• Developed Pyspark scripts to flatten the data and load the data into clean layer.
• Developed Spark Scripts, Schemas to Parse the Json Files and load the data into Stage Layers.
• Developed Spark and Hive Scripts to apply the transformations on the parsed stage layer data as per the requirements and load the data into Raw Layers.
• Developed Pyspark scripts to aggregate the sales data and created SNS topics for the stake holder to access the data.
• Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
• Writing HiveQL to analyze the number of visitors to digital store and their visit information such as views, most visited item pages.
• Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
• Migrated the old pig scripts into spark scripts.
• Developed Spark and Hive Scripts to parse the store traffic data into Stage, Raw, and Clean Layers.
• Attending the daily status meetings to discuss about the status of the pending issues.
• Worked as part of an agile team serving as a developer to customize, maintain and enhance a variety of applications for Hadoop.
• Created test scenarios, QE and reviewed the test results with business analysts.
• Used Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment.
• Used Jira for bug tracking and Bit bucket to check-in and checkout code changes.
• Managing and scheduling Jobs on an Airflow. Environment: Hive, Spark, Pig, EMR, S3, EC2, Sqoop, HDFS, PyCharm, Bitbucket, Jira, Airflow, SQL Server, Jenkins, UNIX Shell Scripting.
- Hadoop Developer at Anthem Inc
- Hadoop Developer at DELL EMC
- Hadoop Developer at Infosys Limited
1 year, 1 month at this Job
Description: AA is World's largest Airline after the merger with US Airways. Provide technical leadership and direction to team members from solution and conceptualization to implementation. Manage LUS & LAA integration efforts, project planning, Provide proof-of-concepts to reduce engineering churn. Give extensive presentations about the Hadoop ecosystems, best practices, data architecture in Hadoop. Provide mentorship and guidance to other development engineers and technical leaders. Debug and solve issues with Hadoop as on-the-ground subject matter expert. This could include everything from patching components to post-mortem analysis of errors. Responsibilities:
• Responsible for building scalable distributed data solutions using Hadoop.
• Worked in the BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
• Worked in Hadoop Mapreduce, HDFS Developed multiple Mapreduce jobs in java for data cleaning and processing.
• Worked extensively in creating Mapreduce jobs to power data for search and aggregation.
• Designed a data warehouse using Hive
• Handling structured, semi structured and unstructured data
• Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems and vice-versa.
• Developed Simple to complex Mapreduce Jobs using Hive and Pig.
• Optimized Mapreduce Jobs to use HDFS efficiently by using various compression mechanisms.
• Handled importing of data from various data sources, performed transformations using Hive, Mapreduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop 0
• Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
• Extensively used Pig for data cleansing.
• Created partitioned tables in Hive.
• Managed and reviewed Hadoop log files.
• Involved in creating Hive tables, loading with data and writing hive queries that will run internally in Mapreduce way.
• Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
• Responsible to manage data coming from different sources
• Extensively used Pig for data cleansing.
• Created partitioned tables in Hive.
• Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
• Developed the Pig UDF'S to pre-process the data for analysis.
• Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
• Mentored analyst and test team for writing Hive Queries. Environment: Hadoop, Mapreduce, HDFS, Hive, HBase, Sqoop, Java (jdk1.6), Pig, Flume, Oracle 11/10g, DB2, Teradata, MySQL, Eclipse, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, Toad, Putty, XML/HTML, JIRA
- Hadoop Developer at American Airlines
- Hadoop Developer at Capital One
- Hadoop Developer at Viral Heat
- Hadoop Developer at Apple
6 years at this Job
TriWest Healthcare is based in Phoenix, AZ; Here I am working here as Big Data Developer to develop clinical data by working with study monitors and/ or on-site investigators using Hadoop, AWS and Big Data technology. Responsibilities * As a Big Data/Hadoop Developer, I am working on Hadoop eco-systems including HBase, Hive, Spark Streaming and MapR distribution. * Worked on Big Data infrastructure for batch processing as well as real-time processing. Responsible for building scalable distributed data solutions using Hadoop. * Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance. * Used Spark to create the structured data from large amount of unstructured data from various sources. * Deployed MapReduce and Spark jobs on Amazon Elastic MapReduce using datasets stored on S3. * Used Amazon CloudWatch to monitor and track resources on AWS. * Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions. * Created Managed tables and External tables in Hive and loaded data from HDFS. * Monitored workload, job performance and capacity planning using Cloudera Manager. * Created Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically. * Deployed the application in Hadoop cluster mode by using spark submit scripts. * Worked on the large-scale Hadoop YARN cluster for distributed data processing and analysis using Spark and Hive. * Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better. * Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and MapReduce. * Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files. * Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrated Hive with existing applications. * Performance tuning of Hive queries, MapReduce programs for different applications. * Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS. * Used Test driven approach for developing the application and Implemented the unit tests using Python Unit Test framework * Involved in ad hoc stand up and architecture meetings to set up daily priorities and track the status of work as a part of highly agile work environment. Environment: Hadoop 3.0, Spark 2.3, Hive 2.3, MapReduce, Yarn, HDFS, AWS, S3, HBase 2.1, CDH3, CDH4, Python, ad hoc.
- Big Data/Hadoop Developer (AWS) at Triwest Healthcare
- Hadoop Developer at T- Mobile
- Spark/Scala Developer (Azure) at Wells Fargo
- Java Developer at Virtusa Corporation
1 year, 2 months at this Job
CIGNA Information Management Application:
CIGNA - Connecticut General Life Insurance Company is one of the largest healthcare providers
In the United States, providing managed medical, dental care products, group indemnity health insurance and related services. CIGNA Health Care (CHC) is the largest line of business within CIGNA.
Cigna Information management application (CIMA) is a centralized enterprise data store build with Hadoop echo system by collecting data from multiple data sources and parsing to enterprise standards used for statistical and predictive analysis and produce reports to made decisions by sales and underwriting teams. Responsibilities:
• Understanding the business requirements by participated in the Business requirements review meetings conducted by SME's and System Architects.
• Preparing understanding document by including the design plan for the set of Hadoop jobs required to full fill each business requirement by reviewing both business requirement documents and the functional requirement documents.
• Preparing the estimates for the deliverables based on the understanding document.
• Participating in the design phase to prepare low level design documents by working closely with the business team.
• Reviewing low level design documents with the SME's and System Architects to make sure all are on the same page before starting development phase.
• Defining the execution sequence of the Hadoop Jobs developed using the Map Reduce techniques in Java language.
• Preparing the Technical specs from the low level design documents by understanding the inputs and out puts of the Hadoop jobs.
• Developing the Map reduce programs using java language as specified in technical spec documents.
• Get the confirmations from System architects, if there is any show stoppers identified while construction and update the clarifications to the set of documents prepared earlier as part of earlier phases of SDLC.
• Preparing test data for the set of Hadoop jobs, by copying sample production data to local file system and copy the same data in to HDFS file system in unit region
• Completion of unit testing for the new Hadoop jobs on standalone cluster designated for Unit region using MR Unit.
• Supporting QA team to complete System testing, when they encounter any road blocks for testing.
• Debugging and identifying issues reported by QA with the Hadoop jobs by configuring to local file system.
• Copying the sample production data generated for limited period of time using Flume to E2E region Hadoop cluster.
• Validating results in E2E testing which helps us to identify the most of the issues will encounter in the production environment.
• By reviewing the E2E results, we will identify the performance issues with the new code and prepare with the performance suggestion documents.
• Providing post-implementation, enhancement and maintenance support to client for application
• Developed a customized Hadoop program to perform the different file operation on the HDFS equivalent to Linux shell commands to make teams job easier to set up the test data.
• Developed two complex programs which are part of conversion processing of the claims received from the multiple claim engines in to unique format by developing own custom key, values and sort the all claim activities to keep the multiple activities on the same type of claim in to single file using secondary sorting techniques.
• Developer custom partitioner to route the claims with the same disposition and from the same claim engine to same reducer using composite key concept.
• Developed an automation process to merge the multiple output files from the different reducers to single out put file on HDFS.
• Facilitated functional and technical knowledge transfer sessions.
• Experience in managing and reviewing Hadoop log files.
• Experience in using PIG and HIVE in Map Reduce context.
• Experience in running Hadoop streaming jobs to process terabytes of xml format data.
• Got good experience with NOSQL database.
• Setup and benchmarked Hadoop clusters for internal use. Environment: Java, Eclipse, Hadoop, Map Reduce, HDFS, Informatica, Oozie, Flume Windows NT, LINUX, UNIX Shell Scripting, HIVE 0.13.
- Hadoop Developer at CIGNA Health Care
- Hadoop Developer at CIGNA Health Care
- Java Developer at CIGNA Health Care
- Java Developer at CIGNA Health Care
5 years, 6 months at this Job
Every action we take as 1ACI redefines what's possible.
As Innovators, we Envision possibilities. As Developers, we Build possibilities.
As Leaders, we Empower possibilities.
At ACI, we're not just driving payments at the speed of change.
We're Making Possibilities Happen. Our people are the core of our business. Our 1ACI team represents a globally diverse, passionate and dedicated group of thousands of individuals around the world who share a common commitment to making our customers successful by driving the future of payments. As a Sr. Hadoop Developer in East Brunswick, NJ you can help make possibilities happen.
• Involved in gathering business requirements from Stakeholders and Subject Matter Experts.
• Installed and configured Hadoop MapReduce, HDFS, and developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
• Worked on analyzing Hadoop cluster and different big data analytical and processing tools including Pig, Hive, Sqoop, Python and Spark with Scala and Java.
• Worked on the large-scale Hadoop YARN cluster for distributed data processing and analysis using Spark, Hive, and HBase.
• Wrote multiple MapReduce programs for data extraction, transformation and aggregation from XML, JSON, CSV & other compressed file formats.
• Created Hive tables, load them with data and wrote Hive queries to run internally in MapReduce way.
• Created Hive queries to compare raw data with EDW reference tables & perform aggregates.
• Developed Sqoop scripts to import data from relational sources & handle incremental loads.
• Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processed the data with Pig.
• Developed and deployed Oozie Workflows for recurring operations on Clusters.
• Developed a generic utility in Spark for pulling the data from RDBMS system using multiple parallel connections.
• Worked on importing and exporting data from Oracle into HDFS and Hive using Sqoop for analysis, visualization and to generate reports.
• Involved in piloting Hadoop cluster hosted on Amazon Web Services.
• Created detailed AWS Security groups which behaved as virtual firewalls that controlled the traffic allowed reaching one or more AWS EC2 instances.
• Used Apache NiFi for creating a pipeline which consumes the data from source, does data processing and stores the data into AWS HBase tables using Kafka.
• Developed business services using Java RESTful web services with Spring MVC framework.
• Responsible for managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
- Sr. Hadoop Developer at BBVA Compass
- Sr. Hadoop Developer/ Admin at American Family Insurance
- Hadoop Developer at Northwestern Mutual
- Hadoop Developer at TransUniona
1 year, 3 months at this Job
• As a Spark/Hadoop Developer worked on Hadoop eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
• Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
• Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
• Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
• Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
• Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
• Maintain Hadoop, Hadoop ecosystems, and database with updates/upgrades, performance tuning and monitoring.
• Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
• Prepared data analytics processing, and data egress for availability of analytics results to visualization systems, applications, or external data stores.
• Builds large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL.
• Responsible for design and development of Spark SQL Scripts based on Functional Specifications.
• Used AWS services like EC2 and S3 for small data sets processing and storage.
• Provisioning of Cloudera Director AWS instance and adding Cloudera manager repository to scale up Hadoop Cluster in AWS.
• Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
• Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
• Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
• Developed Spark Applications by using Scala, Java, and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
• Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
• Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
• Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
• Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
• Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
• Responsible for developing data pipeline using flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
• Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
• Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
• Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
• Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
• Designed and developed automation test scripts using Python
• Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
• Used the Spark -Cassandra Connector to load data to and from Cassandra.
• Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
• Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
• In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
• Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
• Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
• Writing Pig-scripts to transform raw data from several data sources into forming baseline data.
• Analyzed the SQL scripts and designed the solution to implement using Pyspark
• Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
• Extracted large volumes of data feed on different data sources, performed transformations and loaded the data into various Targets.
• Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Sr. Spark/Hadoop Developer at UHG
- Sr. Big Data/Hadoop Developer at Comcast
- Hadoop Developer at Wells Fargo
- Sr. Java/Hadoop Developer at Caterpillar
1 year at this Job