Big Data architect for various clients Configuring/Building data pipelines for ingestion using big data tools like kafka /flume/sqoop/spark streaming. Analytics using formula based , machine learning algorithms : Spark R/Spark ML. Distrubuted computing /data base using Hadoop/Hive/HBase/Spark in various formats like parquet/avro NoSql data storage using Couchbase
- Big Data Architect at Accenture
- Information Technology Associate at Tata Consultancy Services (TCS)
5 years, 4 months at this Job
- Bachelor's - Electrical Engg
• Architecting, managing and delivering the technical projects /products for various business groups.
• All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats.
• Architected all the ETL data loads coming in from the source system and loading into the data warehouse
• Ingest data into Hadoop / Hive/HDFS from different data sources.
• Created Hive External tables to stage data and then move the data from Staging to main tables
• Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
• Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
• Experienced in working with Apache Storm.
• Implemented all the data quality rules in Informatica data quality.
• Involved in Oracle PL/SQL query optimization to reduce the overall run time of stored procedures.
• Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFSand processed.
• Migrated large volume of PB data warehouse data to HDFS.
• Utilize AWS services with focus on big data Architect /analytics / enterprise data warehouse and business intelligencesolutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
• Experience in data cleansing and data mining.
• Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function.
• Worked on tools Flume, Storm and Spark.
• Proof-of-concept to determine feasibility and product evaluation of Big Data products
• Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
• Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
• Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
• Design of Redshift Data model, Redshift Performance improvements/analysis
• Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
• Worked on configuring and managing disaster recovery and backup on Cassandra Data.
• Developed Spark jobs to transform the data in HDFS.
• Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
• AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
• Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
• Involved in developing Map-reduce framework, writing queries scheduling map-reduce
• Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
• Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
• Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts. Environment: Hadoop, HIVE, HDFS, HBASE, Data Modeling, MapReduce (MRv1, MRv2), R-language, Python, Zookeeper, Sqoop, Oozie,ETL, Cassendra and Teradata and TALEND Big Data
- Big Data Architect at Hewlett Packard
- Big Data Architect at Continental North America
- AWS Cloud Architect at Capital One (Bank)
- Mainframes / Developer at Tech Mahindra
1 year, 8 months at this Job
Technology: Google Cloud Platform (BigQuery, Pub/Sub, DataFlow Cloud EndPoints, GCS, ML Apis), Apache Nifi, PySpark, Azure, AirFlow, AZURE, Jupiter, Zeppelin, Hive, Python, Java, Scala
Team Size: 5
● Scotia Bank: Architect, design and migrate on-prem HDP cluster architecture to google cloud platform architecture. 150 hdp nodes, with Hive/Spark, Kerberos, Ranger & knox migration to Google BigQuery, DataFlow, DataProc/Hive, Google IAM.
● Blink Health: Architect, design & implement AWS Data pipeline integration of ZenDesk real-time data via Kinesis Streaming pipelines to S3 & Redshift.
● Spireon: Architect & design new ODS solution in AWS for Spireon IOT needs(1.5 mil devices, 7 GB per sec ingestion rate)
● Cara: Did architect Cloud big data architecture solution for Cara (800 restaurant) to migrate their current OnPrem MS DW SSAS Cubes to AWS/GCP cloud big data solution, to help meet their analytic needs.
● Equinix: Architect new GCP architecture and implement POC covering batch, real time and ML use cases.
● Teck: Prep truck sensor data(85 Mil records) in Cloud BigQuery for further ML failure predictions using TensorFlow.
● Kick AaaS Product: Design implement Kick-AaaS Big data Analytics product using PySpark, Azure, Nifi, AirFlow that allows data ingestion pipelines AZURE, Google cloud DW analytics. Primary Responsibilities:
● Architect, design, implement poc, test, document, review and define standards
● Handle Pre-Sales, lead scrum, update stories, and work with various product groups.
- Big Data Architect at Pythian
- Principal Lead Engineer at Clickfox Inc
- Solution Architect/Dev Lead at Charles Schwab
- Tech Architect Manager at Truven Health Analytics/ IBM Watson
1 year, 6 months at this Job
• Involved in Design and Architecting of Big Data solutions using Hadoop Eco System.
• Collaborate in identifying the current problems, constraints and root causes with data sets to identify the descriptive and predictive solution with support of the Hadoop HDFS, MapReduce, Pig, Hive, and Hbase and further to develop reports in Tableau.
• Worked on analyzing Hadoop cluster using different Bigdata analytic tools including Kafka, Sqoop, Storm, Spark, Pig, Hive and Map Reduce.
• Installed/Configured/Maintained Hortonworks Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
• Architect the Hadoop cluster in Pseudo distributed Mode working with Zookeeper and Apache.
• Storing and loading the data from HDFS to AmazonAWSS3 and backing up and Created tables in AWS cluster with S3 storage.
• Utilized Big Data technologies for producing technical designs, prepared architectures and blue prints for Big Data implementation and involved in to writing Scala program using sparkcontext.
• Provided technical assistance for configuration, administration and monitoring of Hadoop clusters.
• Involved in loading data from LINUX file system to HDFS and Importing and exporting data into HDFS and Hive using Sqoop and Kafka.
• Implemented Partitioning, Dynamic Partitions, Buckets in Hive and Supported MapReduce Programs those are running on the cluster.
• Prepared presentations of solutions to BigData/Hadoop business cases and present the same to company directors to get go-ahead on implementation.
• Successfully integrated Hive tables and Mongo DB collections and developed web service that queries Mongo DB collection and gives required data to web UI.
• Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
• Used Spark to create API's in JAVA and Scala and real time streaming the data using Spark with Kafka and developed Hive queries, Pig scripts, and Spark SQL queries to analyze large datasets.
• Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
• Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra and transfer data between Azure HDInsight and databases using Sqoop.
• Worked on debugging, performance tuning of Hive & Pig Jobs and implemented test scripts to support test driven development and continuous integration.
• Developed enhancements to MongoDB architecture to improve performance and scalability.
• Deployed Algorithms in Scala with Spark, using sample datasets and done Spark based development with Scala.
• Manipulating, cleansing & processing source data and stage it on final hive/redshift tables and involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
• Used Storm to consume events coming through Kafka and generate sessions and publish them back to Kafka.
• Extracted feeds form social media sites such as Facebook, Twitter using Python scripts and designed end to end ETL work flow/jobs for Cassandra NoSQL DB as source.
• Involved in analysis, design and development phases of the project. Adopted agile methodology throughout all the phases of the application.
• Provisioned an Azure HDInsight cluster and connected to an HDInsight cluster, upload data, and run MapReduce jobs.
• Gathered and analyzed the requirements and designed class diagrams, sequence diagrams using UML.
• Writing scala classes to interact with the database and writing scala test cases to test scala written code.
• Performed exceptional J2EE Software Development Life Cycle (SDLC) of the application in Web and client-server environment using J2EE.
• Used Kibana web - based data analysis and dash boarding tool for elastic search and used logstash to stream data from one or many inputs, transforms it and output it one or many outputs. Environment: Big Data, Hadoop, HDFS, Pig, Hive, MapReduce, Azure, Sqoop, Spark, Kafka, LINUX, Cassandra, MongoDB, Scala, Storm, Elastic search, SQL, PL/SQL, Scala, AWS, S3, Informatica, Redshift.
- Sr. Big Data Architect at Burlington Coat Factory
- Sr. Big data Architect at PNC Bank
- Sr. Java/Hadoop Developer at HSA Bank
- Sr. Java Developer at Coventry Health Care
2 years, 10 months at this Job
Industry: Health insurance
Environment: Z/OS, Windows 7, Hadoop HDFS cluster with Mapr 5.1
Duration: Nov 2016 to till now
Role: Big data architect
Big Data Platform as a service(BDPaas) enables the creation of more efficient, accurate, cost effective business analytics solutions enabling the enterprise to focus on delivering critical solution to make the overall healthcare system more efficient and help people lead healthier lives. It brings several best of breed Big Data technology and cloud technology together in highly scalable enterprises grade analytics platform. The platform includes foundational capabilities of data ingestion, data repository &processing, data discovery, search & visualization, data analytics, data integration &enrichment. In addition to the foundational capabilities , the platform also includes system & data security and operation & maintenance.
• Guiding the full lifecycle of a Hadoop solution including requirements analysis, data governance, capacity requirements, technical architecture design (including hardware, OS, network topology), application design, code review, testing, deployment and executes a process of moving the data from Mainframe to Hadoop environment.
• Working with development and quality assurance team.
• Build new tenant platforms on project request( ingesting data from datalake , data modeling, pig script, hive hql and hbase)
• Data processing using spark, pig scripts, shell scripts.
• Creating dashboard on Tableau and Elastic search with Kibana. Project-2
- Big data architect at United Health Group
- Big data architect at State farm Insurance
- Big data architect at State farm Insurance
- Infrastructure engineer at State farm Insurance
2 years, 2 months at this Job
- M.C.A - Master of Computer Applications
- B.B.A - Business administration
• Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
• Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
• Developed Oozie 3.1.0 workflow jobs to execute hive 2.0.0, sqoop 1.4.6 and map-reduce actions.
• Design big data authentication solution using LDAP/Kerberos and Authorization using UNIX groups and HDFS ACLs.
• Architect the Hadoop data security using DEZ, Data Encryption Zone, and control data access using UNIX groups and HDFS ACLs.
• Architect/design the Integration solution using Oozie coordinators and Control M.
• Worked with the technology manager and business stake holders to demonstrate the strategic value of the data lake platform.
• Lead the platform and data migration from Big Insight 4.1 to Big Insight 4.2.
• Designed code generation framework using UNIX shell and python to automate the Hadoop code artifacts (BigSQL, hive, HBase, Oozie coordinator, Oozie workflow).
• Designed data analysis and visualization using BigSQL DSM and IBM Big Sheet.
• Worked with the IBM Bluemix support to solve the platform issue and apply required patches. Technology Used: IBM BigInsight 4.1/4.2,IBM InfoSphere Datastage, Python, shell script, Hive, HBase, BigSQL, Spark, Pig, BigSheet, IBM DSM, Control-M, Oozie, DB2, OS/360, Linux
- Sr. Big Data Architect at
- Sr. Data Architect at Veracity Englewood C
- Sr. Enterprise Data and Big Data Architect at Belk, Inc
- Integration/Data Architect at BJ's Wholesale Club
2 years, 4 months at this Job
- MS in Computer Science - Computer Science
Principal technical architect for an IoT end-to-end design and orchestration of user based events and logs from 100s of devices into a multimode HDInsight Spark clusters in Azure. Implemented the platform in azure as an improved replacement for an existing Cloudera solution in AWS. Designed the real time analytics and ingestion platform using IoT Edge, IoT Hub and Stream Analytics into ADLS as HDFS. Implemented processing and storage throughput leveraging Hadoop framework Spark (Python, Scala and SparkSQL), Sqoop and Hive. Designed and implemented an enterprise Cloudera platform in Azure for large scale data ingestion and processing leveraging Director, Manager, Navigator and Data Science Workbench with security. Realtime data ingestion with Kafka from sensors, Spark and Sqoop importing and processing data harnessed from legacy and traditional data stores into Hive synched against Impala. Designed and built the machine learning workflow environment for Cloudera in data science workbench. Architected and implemented a highly scalable Spark-based solution with Azure Databricks for processing terabytes of data with a focus on deep learning analytics. Developed Spark Python modules for data engineering and ADLS for raw data storage. Hot data persisted in Hive leveraging SparkSQL, cold store in ADW. Produce artifacts in support of reference architecture advocacy and implementation, including authoring documentation, white papers, and presentations/diagrams for dissemination to technical and business audiences. 773.398.2473 - [email protected]
- Senior Big Data Architect at Capax Global
- Principal Architect at BDP - AFNI - Zebra - Axalta
- Principal Architect at Shire Pharmaceuticals
- Sr. Data Architect at Komatsu - BARD Medical - Ball Horticultural - Turner Construction
1 year at this Job
- Master of Science - Information Systems
- Master of Science - Computer Science
Focus on Big Data Architectures, Data Pipelines, Hadoop, Hive, HBase, ElasticSearch, Kafka, application process improvements, gathering business requirements, data quality and managing work effort across two projects
• Collaborated with customers to gather, review and refine business requirements
• Designed Big Data architecture to support Customer business requirements
• Data architecture supported structured, semi-unstructured data and multiple access patterns
• Designed data pipelines to standardize data from multiple sources into common format
• Loaded data into HBase and ElasticSearch; Hbase being the System of Record
• Shared MetaStore between Hbase and Hive providing consistent data across the data platforms
• Stabilized legacy applications and reduced average data load times from weeks to seconds
• Managed Sprints across two projects and five developers
- Big Data Architect at RCG Global Services
- Data Architect at GE - Power Services Engineering
- BI Developer at The HOME Depot
- Data Architect at Sterling Information Systems
11 months at this Job
- - Leadership Strategies
Description: iLog is the front end application advisors can use to create the Repair case for iPhone/iPad and Mac systems via phone/Email or chat.
• Participated in requirement gathering sessions with client business owners, and provided inputs in to creation of Functional Requirement Documents.
• Design and develop the framework components
• Responsible for application architecture
• Provide solution to the problems in the project
• Written the sections of the Technical Architecture document and System Design Documents
• Work with the team to resolve the technical problems
• Conducting code reviews
• To make sure the team is following the proper coding standards and architecture guidelines. (SONAR is used as tool to monitor the code coverage and to monitor the coding standards)
• Define and Review delivery objectives, operations metrics, project schedule, timeline, status and manage IT service improvement initiatives.
• Create the Sprints and break down the tasks.
• Work on Agile Methodology. Created the Radars based on the Work priority.
• Capacity Planning
• Define and Review the application architectural design for the new requirements
• Create the InfoSec, App2App and Caesar requests based on project Needs.
• Acls to create and validate the server connectivity.
• Coordination with Customer & Sr. Management.
• Daily and Weekly Project reviews with client.
• Determining the resource requirements and hiring the required resources for the project teams.
• Proposal preparations based on client requirement.
• Mentoring and training.
• Prepare the Project Plans.
• Design and implement scalable Big Data architecture solutions for iLog application needs.
• Analyze multiple sources of structured and unstructured data to propose and design data architecture solutions for scalability, high availability, fault tolerance, and elasticity.
• Develop conceptual, logical and physical design for various data types and large volumes.
• Architect, design and implement high performance large volume data integration processes, database, storage, and other back-end services in fully virtualized environments.
• Work closely with customers, at a technical and user level, to design and produce solutions.
• Work closely with the product management and development teams to rapidly translate the understanding of customer data and requirements to product and solutions.
• Create the topic and establish the connection with Brokers and read the message from topic.
• Work with config and server properties.
• Create the kafka clusters and launch the clusters. Environment: iOS 10, Java, Webservices, Hadoop, Scala, Spark, Json, Cocoa, Splunk, Radar, Agile model(Scrum), Mongo DB. Tools: Eclipse (Scala IDE),
- Java, Big Data Architect at Apple Inc
- Onsite Delivery Lead/coordinator at Staples Inc
- Programmer, Sr Java Developer and Onsite coordinator at Nielsen Media Research
- Java UI Developer at British Telecom
1 year, 11 months at this Job
Responsibilities: * Provide technical leadership and contribute to the definition, development, integration, test, documentation, and support across multiple platforms. * Design/ architected and implemented complex projects dealing with the considerable data size (GB/ PB) and with high complexity. * Provide deployment solutions based on customer needs with Sound knowledge about the clustered deployment architecture. * Able to guide / partner with VP / Directors for architecting solutions for the Big data Organization * Created detailed AWS Security groups which behaved as virtual firewalls that controlled the traffic allowed reaching one or more AWS EC2 instances. * Data modeling, Design, implement, and deploy high-performance, custom applications at scale on Hadoop /Spark. * Data processing with MapReduce and Spark. * Stream processing on Spark/storm thru Kafka message broker. * Review and audit of existing solution, design and system architecture. * Perform profiling, troubleshooting of existing solutions. * Create technical and designing documentation. * Creation of a User Interface to search and/or view content within the cluster by using solar cloud. * Worked on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing. * Cluster management and analytic in Cloudera and Horton work. * Distributed database Design, Data modeling, Development and Support in Datastax Cassandra distribution. * Cassandra products strengths and weakness to produce efficient schema designs that serves effective and high performance queries. * Maintain and work with our data pipeline that transfers and processes several terabytes of data using Spark, Scala, Python, Apache Kafka, Pig/ Hive & Impala * Apply data analysis, data mining and data engineering to present data clearly. * Ensure high-quality data and understand how data is generated out experimental design and how these experiments can produce actionable, trustworthy conclusions. * Full life cycle of Data Lake, Data Warehouse with Big data technologies like Spark, Hadoop , Cassandra * Working with Spark, RDD, Data Frames, Data Pipelines. * Building complex ETLs, Data Warehousing or custom pipelines from multiple data sources. * Setting up connector for security logs and Splunk data use cases. * Building the Hadoop cluster (MTS) to host the three use cases * Analyzing the data using Tableau. * Extract and analysis the data before load into cluster. * Review and understand data architecture, data models, Source to target mapping rules and Match and merge rules. * Evaluate Hadoop infrastructure requirements and design/deploy solutions (high availability, big data clusters, elastic load tolerance, etc.). * Hadoop ecosystem components in our open source infrastructure stack specifically: HBase, HDFS, Map/Reduce, Yarn, Oozie, Pig, Hive, Kafka, Storm, Spark, Spark-SQL and Flume. * Estimate and obtain management support for the time, resources and budget required to perform in different projects. * Keep track of the new requirements / change in requirements of the Project. * Understand Inbound and outbound data flow requirements, data models for Landing, Staging and base objects, Mapping documents, Match and Merge rules. * Proof of Concept (POC) and Proof of Technology(POT) execution and evaluation on MTS platforms. * Installing and Configuring required ecosystem tools for each use case Environment: Big Data, Spark, YARN, HIVE, Pig, Scala, Python, Hadoop , AWS, Dynamo DB, Kibana, Cloudera, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL. Sr. Big Data/ Hadoop Developer
- Big data architect at Turner Broadcasting
- Front Controller, Service Controller at ADP - Florhan
- Service Controller at Orlando FL June
- Big data architect at Actionet
2 years, 2 months at this Job