This project is intended to develop and Build dashboards, automated reports and report templates using advanced tableau functions. Provide guidance insight on data visualization and tableau dashboard design best practices. Support multi environments tableau infrastructure including server security troubleshooting and general system maintenance.
• Responsibilities involved the analyzing end user requirements and communicating and modeling them to the development team.
• Retrieved data from Hadoop Cluster by developing a pipeline using Hive(HQL) , SQL to retrieve data from Oracle database and used ETL for data transformation.
• Performed data wrangling to clean, transform and reshape the data utilizing pandas library. Analyzed data using SQL, R, Java, Scala, Python, Apache Spark and presented analytical reports to management and technical teams.
• Worked with different datasets with complexity including both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation and Visualization.
• Developed predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning and deep learning.
• Analyzed Historical data by using various machine learning algorithms such as clustering, multiple linear regression, logistic regression, SVM, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
• Conducted exploratory data analysis using Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, SciPy, NLTK in Python for developing various machine learning algorithms.
• Implemented Data Quality validation techniques to validate data and identified many anomalies. Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R and Python.
• Enforced model Validation using test and Validation sets via K- fold cross validation, statistical significance testing.
• Worked with various kinds of data (open-source as well as internal). I have developed models for labeled and unlabeled datasets, and have worked with big data technologies, such as Hadoop and Spark, and cloud resources, like Azure and Google Cloud.
• Enforced F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall evaluating different model's performance.
• Multi-layers Neural Networks built in Python Scikit-learn, Theano, TensorFlow and keras packages to implement machine learning models.
• Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
• Created complex charts and graphs with drill downs that will allow various divisions to quickly locate outliers and
• correct any anomalies.
• Developing Stored Procedures, Functions, Views and Triggers, complex SQL queries using SQL Server, TSQL and Oracle PL/SQL.
• Worked with various data sources with multiple relational databases like Oracle11g /Oracle10g/9i, MS SQL Server;
• Relational and Flat Files into the staging area, ODS, Data Warehouse and Data Mart.
• Designed and developed standalone data migration applications to retrieve and populate data from Azure Table / BLOB storage to Python, and Power BI.
• R programming language for graphically critiquing the data and performed data mining. Interpreting Business requirements, data mapping specifications and responsible for extracting data as per the business requirements.
• Participated in features engineering such as feature generating, PCA, Feature normalization and label encoding with Scikit-learn preprocessing. Data Imputation using variant methods in Scikit-learn package in Python.
• Utilized Informatica toolset (Informatica Data Explorer and Data Quality) to inspect legacy data for data profiling.
• Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems.'
• Created reports, dashboards and data Visualizations by using Tableau, to explain and communicated data insights, significant features, model's score and performance to perfectly elucidate for both technical and business teams. Environment: Python 3.6.4, R Studio, MLLib, Regression, NoSQL, SQL Server, Hive, Hadoop Cluster, ETL, Spyder 3.6, Agile, Tableau, Java, NumPy, Pandas, Matplotlib, Power BI, Scikit-Learn, Seaborn, e1071, ggplot2, Shiny, TensorFlow, AWS, Azure, HTML, XML, Informatica Power Center, Teradata.
- Sr. Data Engineer at Cummins INC
- Data Engineer/Data Analyst at CVS Health
- Scrum Business Systems Analyst at Anthem
- Data Analyst at SidSoft Technologies Private Ltd
1 year, 8 months at this Job
• Working as a Data Engineer Consultant
• Worked as trainer/teaching faculty for 2 years
- Data Engineer at Data Engineer Consultant
at this Job
- Master of Technology - Technology
• Maintain and model data storage (Dynamo and MySQL)
• Develop and maintain of micro services deployed in AWS (serverless, EC2) designed for real time front end service as well as scheduled services to process data
• Develop scripts in bash to streamline code deployment and service setup in various server environments
• Develop data piping services to support data warehousing in Google Cloud BigQuery
• Develop and maintain Google Cloud BigQuery data model and views
• Main Technologies used: Python3.6, AWS, Google Cloud, Gearman, MySQL, BitBucket, JIRA
- Data Engineer at Planoly
- Software Engineer at Volusion
- Software Engineer at Schlumberger
- Software Developer Intern at SD Optosys
7 months at this Job
- Bachelor of Science - Computer Science
- Masters - Computer Science
Western Industrial X-ray, Inc - Fairfield, CA
• Design/Maintain Access Database of Inventory and Equipment for purpose of Quality Control, keeping track of required equipment Calibrations/where equipment is, in-house and vendor repair.
• Produce company reports with other Microsoft programs as required.
• Design/Maintain Company Website (wixinc.net), General Accounting. Administrator Assistant Carson Landscape Industries - Sacramento, CA
• Computer Tech Support, Admin. Assistant - proposals, data base entry, other. SMTS, Analyst/Programmer Computer Science Corporation - Edwards AFB
• Team Lead
• Provide analysis of data, QA/QC, customer services, testing for special projects, testing new software, training and others as needed.
- Data Engineer at Western Industrial X-ray,Inc
6 years, 9 months at this Job
- Bachelor of Science - Physics
• Everything But The House (EBTH) Data Engineer Cincinnati, OH June 2018 - Present
• Designed the scalable infrastructure required for optimal ETL of data using Airflow, to move data from a variety of data sources to the data warehouse
• Developed and automated custom Airflow operators and DAGs to read Kafka streams from S3, translate them to Snowflake command files and execute them onto the data warehouse
• Migrated data warehouse from Redshift to Snowflake, which increased querying performance by 30%
• Implemented parallel execution of DAGs for batch processing each target table files to improve ETL performance by more than 50%
• Implemented error handling and made the infrastructure idempotent and deterministic to minimize schema mismatch and data loss in case of hardware and network failure
• CI/CD of Airflow DAGs and custom operators via Docker to staging and production environments
- Data Engineer at Everything But The House (EBTH)
- Co-founder & Front-End Developer at Delfinite
- Database Developer at TEVA Pharmaceutical
- Research Assistant at Novel Device Laboratory
7 months at this Job
- B.S. - Computer Science
provides business intelligence solutions to organizations involved in clinical research like Pharma, Cancer Centers, Academic medical centers etc., Their main mission is to provide the research organizations with a unified BI and reporting platform that aids in better enterprise visibility, improving the trial performance by covering the entire research life cycle.
Role: Data Engineer
• First employee of the Startup responsible for architecting and developing the entire product line that includes LAVIS Enterprise Datawarehouse, a suite of comprehensive Business Intelligence applications, standardized reports etc.
• Actively responsible for architecting the Enterprise Data Warehouse, development or extension of the data models by integrating the various source systems and maintaining the proper relationships between the data marts.
• As part of the data extraction through ETL procedures used SSIS extensively to populate the various data marts through incremental loads and full loads based on the nature of the data sources.
• Designed and developed complex SSIS Packages to transfer data from public data sources to populate the staging areas.
• Extensively used SSIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send Mail task etc.
• Developed an error-logging framework that provide notifications for the daily SSIS refresh jobs, helped in ease of error handling and reducing the debugging times significantly.
• Gathered the business requirements with clients in designing and developing the complex reports using Microsoft SQL Server reporting services with minimal report loading or report retrieval times
• Performed Performance tuning for T-SQL scripts, queries using SQL Server Query plans, system catalog views, Sql data profiler, windows performance monitor to improve the performance of databases that in turn improves the entire data pipelines execution process.
• Developed BI applications using Tableau, Microsoft Power BI as part of the product line covering various use cases that helped in reductions of internal operations times by over 30%.
• Developed the BI application by gathering internal systems usage data that helped our clients in saving ~10,000 dollars by removing access to unused user accounts over a period.
• Developed Data pipeline in python to access the tableau server logs for analysis of log data that helped in understanding the usage of applications, improving the performance the applications.
- Data Engineer at LAVIS Research Informatics, Inc
3 years at this Job
- Master of Science - Management Information Systems
- Master of Science - Software Engineering
- Bachelor of Science - Computer Science
• Transferring data of approximately 8.6GB from Hadoop Edge Node to NFS mount and compressing the data in Linux for successful and faster transfer using SFTP
• Aided Data Engineer in successfully building ETL drafts for new tables to improve the data by increasing the size of data by implementing tables for medical claims into clinical data
- Associate Data Engineer at Optum, UnitedHealth Group
- Data Analyst at Optum, UnitedHealth Group
- Faculty Research Assistant at Syracuse University
- Data Analyst Intern at Lam Research Corporation
5 months at this Job
- MS - INFORMATION MANAGEMENT
- BE - INFORMATION TECHNOLOGY
Profile: Application Developer/ Data Engineer. Project: BP-ADAM I was responsible for development, support and maintenance activities mainly database handling and server maintenance through Maximo for our client British Petroleum (BP). I was handling development issues (JAVA) along with database administration using Oracle SQL in production environment. Use of Python to handle the data pipelines from multiple sources and analyzing and channeling them made me gain experience in data warehousing and data science skills. Working in one of the biggest IT organizations around the globe allowed me to work in ‘Agile’ model and learn the importance of team-work simultaneously giving me profound insight in USA’s work culture. Furthermore, my efficiency and dedication along with my passion to learn made me achieve the ‘Manager’s choice performance award’ twice in a row. IBM meticulously enhanced my data analytical and machine learning skills.
- Application Developer/Data Engineer at IBM
- Intern at Tata Steel
1 year, 11 months at this Job
- Master of Science - Computer Science
- Bachelor's - Computer Science
Worked as a Data Engineer in the SDR portfolio. This was a Cloudera based Hadoop eco system. The project goal was to set up a data pipeline. Data from SQL Server database was loaded to HDFS. HIVE external tables were defined and data from HDFS was retrieved using HIVE query and Impala and were available for various business units for their reporting. Pyspark jobs were created to transform data using Hadoop files, RDD and Data frame.
• Worked with BA's, product owners and end users to understand the process requirements and acceptance criteria
• Created Sqoop jobs to extract data from SQL Server DB and load in HDFS
• Defined HIVE tables as per downstream application data need
• Designed and developed ETL work flows using spark SQL for delivery of data from HDFS into downstream applications
• Designed and developed Spark programs in Python to filter, transform data using RDD, Data frame APIs
• Scheduled jobs in Oozie Environment: CDH, SQL, Sqoop, HDFS, Hive, Python, Spark SQL, Oozie
- Senior Data Engineer at Anthem
- Data Engineer at Care More
- Data Analyst at Health Net
- Senior Developer at Key Bank
2 years, 9 months at this Job
- Bachelor of Engineering in Computer Science - Computer Science
Create, maintain, and automate data pipelines
Sr. Big Data Engineer
• Heavily involved in Data Modelling, ETL Operations and Testing Data Warehousing operations.
• Designed, maintained and automated big data ETL pipelines using Python 3.6v.
• For adhoc queries Daiquery was used extensively for ETL operations. All memory based queries are executed using Presto for faster processes. All other bulk data that are above 5TB are processed using Hive.
• Contributions also involved in working in domains of MS-SQL.
• Efficiently carried operations not only in areas of coding development but also took the ownership of creating and executing test scripts.
• Managed to dwell into complex codes and optimized wherever necessary. Performance tuning was also a key essential aspect of the day to day activity.
• Used Atom for code versioning and Mercurial for code repository.
• Analyzed data and performed complex customer data computations using Jupyter notebook, a python web based tool.
• Participated in business requirements as well as provided trainings to various technical and business users. Environment/tools: MS-SQL, Hive, Presto, Unix Bash Shell.
- Sr. Big Data Engineer at Social Media, Gaming and Virtual Reality
- Sr. Big Data Software Engineer at Markit On Demand Inc
- Master Data Management Consultant at LumenData Inc
- Java/Oracle Engineer at SagaVisions Private Limited
3 years, 3 months at this Job
- Masters in Electrical and Computer Engineering - Electrical and Computer Engineering
- Bachelors in Electrical and Computer Engineering - Electrical and Computer Engineering