Data Architect, Data Hobbyist, and Data Scientist
This is the super long version with EVERYTHING on it.
I am a data scientist, architect, and artist with expertise in the design, implementation, and maintenance of enterprise-scale business intelligence, data mining, data quality, and data migration projects. I have led teams of people through every phase of Enterprise Data Management, and I bring experience from high profile, large-scale clients such as NASA, SpaceX, the US Air Force and the US Senate. I have experience with the program management and development lifecycle standards required to run organized, repeatable, and successful data warehousing projects. I have personally designed data models, coded extract-transform-load (ETL) processes, built reporting architectures, and administered databases and application servers. I have matriculated into planning, architecting, and managing teams performing these tasks on increasingly large and complex systems.
Clearance: DoD Top-secret (expired 2010)
- "Discrepancies in metabolomic biomarker identification from patient-derived lung cancer revealed by combined variation in data pre-treatment and imputation methods" - Metabolomics
- "Prediction of lung cancer patient survival via supervised machine learning classification techniques" - International Journal of Medical Informatics
- "Application of unsupervised analysis techniques to lung cancer patient data" - PubMed / PlosOne
Manifold AI, US (Remote) Director, AI Solutions Labs (Mar. 2021 - Present)
- Moved a 40-person AI startup through a growth period doubling our size (and counting)
- Revamped recruiting, establishing processes for sourcing, interviewing, and hiring to transition from a referral-based company to one that actively recruits
- Established and continually improved company wide, opinionated, repeatable processes for CI/CD, testing, rapid application development, code standards, documentation, ticketing hygiene, agile processes, DevOps standards, and other essential software development practices
- Architected and built software incorporating data engineering and machine learning techniques for a variety of uses:
- PubMed database ML search tool to accelerate focused research
- Bulk life sciences data/metadata ingest and search tool for aligning ontological meaning across large volumes of clinical and research data
- Predictive modeling tool to facilitate sales optimization for pricing sales for a large tech company
- Massive scale internet advertising data management pipelines for analysis and machine learning
Space Exploration Technologies (SpaceX) Hawthorne, CA Sr. Data Scientist / Sr. Business Intelligence Engineer / Sr. Software Developer (Feb. 2018 - Mar. 2021)
- Led the "Build and Launch" DevOps squad to improve and support dozens of major SpaceX systems including a massive Telemetry and custom manufacturing / ERP system
- Led the team responsible for our distributed automatic hardware testing system written in Python with a PostgreSQL/Redis back-end
- Integrated hardware simulations and scripted tests with live and recorded telemetry
- Implemented standalone dockerized versions for Windows and Linux for offsite, offshore, and classified deployments
- Established circular integration tests with our rocket testing process to ensure development was not hindered by rapid changes to the testing infrastructure
- Worked with the Python development team to patch bugs we uncovered from pushing Python to its limits in scientific cross platform parallel computing
- Programmed a distributed redundant telemetry system (Borg) in Go and Python, leveraging postgresql, Ansible, Vagrant and Bamboo
- This on-premises Telemetry system ingested ~30TB per day and provided graphical analysis for over 1000 engineers on 22+ Petabytes of data
- Iterated to mature DevOps with high standards for test coverage, deployment automation, and value per developer time
- Leveraged Elasticsearch/Logstash/Kibana/Grafana to build a telemetry Command and Control center for high availability and early problem detection
- Created an isolated secure Supply Chain vendor portal to improve purchasing lead time and reliability
- Performed SQL Server development with SQL, R, Reporting, Integration, and Analysis Services along with a C#/.NET custom ERP
- Did significant dev-ops work here; managing complex deployments, and ops tools including PagerDuty (to OpsGenie), Sentry, and Solarwinds
- Worked on Data Science/Machine Learning tasks including predictive analytics for supply chain management using natural language processing and computer vision
- Facilitated teams performing computer vision to automate human tasks like video review, control panel configuration, lost item detection, metal fatigue and other issues
- Implemented architecture for Python/Flask/Database production predictive analytics and machine learning tasks
- Maintained the Operating Plan - predictive data set for headcount/resource management planning
- Provided data analysis to reduce scrap costing over $150M
- Designed and implemented metrics/kpi architecture from root code to C-suite dashboards
- Managed and maintained full-stack C# development environment with Jenkins, Bitbucket, Git and other tools to support all company operations
- A few personally exciting things:
- I may have been the first person to commit "Starship" to code non-ironically
- The first time I got to write code that used the speed of light in a real-world meaningful way
- Got to use Quaternions for something practical (it's used to calculate optimal rotations for, hypothetically, pointing lasers at things)
- Worked directly with astronauts as a test/simulation dev/ops - (the Astronaut sims ran on our software)
- Maintained a "Bug of the Month" page with some crazy bugs - I drove an investigation that found corrupt TCP packets due to a short-lived linux kernel bug which made certain versions of linux unusable on the rocket
- Worked directly with Python developers to fix a bug in Python 3.7 that affected our cross platform testing application
- Maintained / contributed to so much other software to a lesser extent... WarpWise (scheduled reporting), Lunchpad, Requirements management tool and data sync (ReqX), Card Swipe/Onsite Security system, Custom build system (Gauntlet)
Passport Health Plan Louisville, KY Data Architect (October 2012 - 2015 & 2017 - Feb. 2018)
- Designed analytical algorithms to enable geospatial reporting on patient coverage
- Established a new data warehouse from the ground up for a Medicare health care management company of ~250 people
- Built training materials, project plan, data models, design patterns, coding standards, physical architecture and infrastructure and all relevant documentation for a completely new data warehouse from scratch
- Quickly implemented Data Quality and Data Dictionary systems through designs pioneered through my previous projects
- Integrated sources from COTS health care, call center, IT management, and other systems into an integrated data warehouse following a Kimball-compliant star schema design
- Implemented the design in Microsoft SQL Server with SSIS/SSAS/SSRS components and a Tableau based dashboard
- Managed ERP and DWH migration during a large corporate merger - integrated data from outgoing and incoming systems into the DWH to minimize stress on reporting
Farm Credit Mid America Louisville, KY Data Architect (Jan 2015 - Jan 2017)
- Led a very small team to build a new data warehouse from scratch, designing policy, procedures, and establishing a VP-Level governing steering committee
- Coordinated predictive analytics capabilities for credit scoring, loan value, collateral valuation and other metrics
- Designed a star schema database with multidimensional cube support to house data from our $21 Billion Loan/Lease business
- Enabled initial end user capability in only a few months; continued to improve and roll out BI capabilities and expand data warehouse footprint
Wright Patterson Air Force Base Dayton, OH
Data Architect and Contractor with Deloitte (1998-2000, 2003-2012)
- Performed systems and network analysis, visualizations, and requirements planning for an IT assessment and Roadmap of AF Global Logistics Supply Chain (GLSC) owned and managed systems
- Designed and implemented machine learning algorithms to improve data quality and value of supply chain data
- Team manager for a staff of 24 people, in a leadership role on a team of 40+ members of a Data Quality initiative as part of a 150+ person ERP team.
- Provided data management, quality, governance, and oversight to for hundreds of logistics and financial systems with the new policies, procedures, and data requirements introduced by the USAF’s Oracle ERP installation – one of the largest Oracle implementations in the world, managing a $36 billion inventory and related supply chain.
- Responsible for the Data Management Organization’s Master Data Management component, including the technical management of various lists of values, building of a data dictionary, and integration with the enterprise data modeling efforts.
- Created an IT strategy and roadmap for the Air Force’s Global Logistics Support Center focusing on Data Management, Systems, Maintenance and Governance.
- Led a team to create a Business Intelligence strategy for the Air Force ERP system. The design stretches from high level governance and policy issues to technical Business Intelligence architecture and enterprise warehouse design.
- Performed as the technical team lead for a development team which integrated over 50 source systems and data marts into a large data warehouse. Ultimately we were responsible for integrating maintenance, supply, logistics data, and systems tracking specific assets into the Air Force Enterprise Data Warehouse.
- Built standards and repeatable CMMI processes for every aspect of Data Warehouse management, including modeling and loading new source data, data mart processing, report specification and building, ongoing maintenance, and pretty much everything in between.
- Setup an early adoption of MQ Series and Oracle AQ for site-to-site communications (supporting RAMPOD)
- Technical demands included Oracle, WebSphere MQ, Business Objects, Cognos, Informatica, SharePoint, MediaWiki, and Teradata support, design, and development in a closely integrated environment.
- Personal highlights:
- Briefing 2-star generals on Nuclear Weapon Related Materials (NWRM); directly supported NWRM investigations when the news media reported material missing and transported in violation of procedure
- Working on CINC-57 ... a list of very high level readiness questions to support the Commander-In-Chief (the President)
- Coming up with a 7-level data quality framework that I still use
- Implementing very early machine learning as early as the 1990s - including a Bayesian tool to predict and clean work order codes (work unit codes/WUCs)
- Working with F-35 and F-22 logistics and maintenance as they were introduced to the fleet and in testing to ensure smooth interaction with enterprise data systems
- Working with AWACS in the mile-long building in Oklahoma City; improving efficiency on the manufacturing lines
- Working on integrated tool systems which are air dropped into forward locations. Data latency measured in months is fun.
- (Attempting to) catalog every data system within the USAF
- Seeing national news articles about our work - (mostly positive,some negative coverage of our work)
Cincinnati Financial Corp. Cincinnati, OH Data Warehouse Consultant with Deloitte (Jan-Feb 2010)
- Provided third party evaluation of a nascent EDW implementation plan. Helped drive Enterprise Service Bus (ESB) adoption and identified solutions to project implementation roadblocks.
- Consulted on a data science sandbox design to improve valuation estimates from automobile accident insurance claims
FirstEnergy Corp. Akron, OH, Data Migration Contractor (2002-2003)
- Supplied technical expertise and programming support for a large scale SAP R/3 migration using Informatica, Oracle, and SAP/ABAP tools.
- Worked closely with both a functional and technical team to design, develop, test, and deploy a migration mechanism for Human Resources data.
US Postal Service Raleigh, NC, Data Warehouse Contractor (2002)
- Provide Informatica and data warehousing expertise for the USPS very large data warehouse initiative.
- Primary responsibility was on increasing performance on very large data loads to the warehouse. For example I reduced the running time for some loads from 80+ hours down to 4 hours.
Clients with KPMG Consulting Washington, DC, Senior Consultant (1998- 2002)
City and County of San Francisco (2000- 2001)
- Designed and implemented a terabyte scale data warehouse for the City.
- Responsible daily for Oracle tuning and database administration (DBA) work, Informatica and Cognos design, development, and performance tuning. Also administered large Windows NT and HP-UX servers.
- Marketed the warehouse to senior department heads, eventually creating departmental data marts and a distributed Oracle environment.
- Developed end user training, documentation, and robust testing/validation procedures.
National Aeronautics and Space Administration (NASA) (1999-2000)
- Lead technical expert on a team that designed and developed an enterprise-wide data warehouse (EDW) for NASA. Duties included researching, specifying, purchasing and administering $1.5 million of Sun hardware running the Solaris operating system.
- Involved in the entire lifecycle of this EDW. Met with scores of functional people from across the country to develop logical and physical data models. Led a small team in the Informatica development for data transformation and cleansing from disparate source systems to the Oracle EDW. Worked with the Holos OLAP package and Crystal Reports and Brio ad-hoc query tools to deliver a web-based reporting solution to the client's specifications.
- Fun to see national news coverage of our work
Performance Executive (various clients)
- Performance Executive was a packaged data warehouse that we marketed to clients of our proprietary accounting systems. I was one of several technical resources on the project throughout my tenure at KPMG, and I was involved with every aspect of the product's creation, testing, and marketing. Primarily the product was back-ended in Oracle 7,8, or 8i, and Microsoft SQL Server, with an Informatica engine for data cleansing and transformation. The front-end tool of choice was Cognos Impromptu and PowerPlay, although some clients used Business Objects, Brio, Microstrategy and Crystal reports.
US Senate, Washington, DC (Fall, 1999)
City of Ottawa, Canada (as needed 2000)
Dahlgren Naval Base, Dahlgren, VA (as needed, 1999-2000)
Wright Patterson Air Force Base, Dayton, OH (Summer, 1998)
Oakland County, Michigan (Sept. 2001-Feb. 2002)
Other Work Experience:
Xavier University Cincinnati, Ohio WebMaster/Programmer/Analyst (Fall 1993 - March 1998)
Farwell and Hendricks Cincinnati, Ohio Senior Thesis Project Worked on thesis project with Farwell and Hendricks to develop a Windows 95 program to collect and analyze data for electronic devices. The program collected 5,000-10,000 samples/second from hardware placed on an earthquake simulator, displayed graphs, and analyzed results to determine instabilities in electric current. Neat stuff!
Education Xavier University, Cincinnati, Ohio
- Bachelor of Science in Mathematics and Computer Science
University of Louisville, Louisville, KY
- Graduate Certificate in Data Mining
- Currently pursuing PhD of Computer Science in Data Mining
Extensive Experience With
- R and Python
- Stash, Git
- Microsoft SQL Server
- Informatica PowerCenter
- Tableau, Cognos, and Business Objects Business Intelligence Products
- MySQL, Oracle and Teradata Database Products and SQL
- ErWin Data Modeling Tool
- Windows Server, Linux and Unix (Various) Operating Systems
- Apache, PHP, HTML, TCP/IP, Other Internet Technologies
Working Knowledge of
- Jira, Bamboo, Jenkins
- Microstrategy, Brio, Holos, Crystal Reports
- Apple Macintosh OS, OpenVMS
- Microsoft IIS, C#
- SAP R/3: ABAP and HR InfoTypes
- DCL, Pascal, C, C++, Visual Basic, C#/.Net, SQL, Go
- PGP, Numerical Cryptographic Theory, System Security
- Numerous other software packages, feel free to ask
- Sun, Hewlett-Packard, and IBM Hardware
- MelissaData Geocoding and Fuzzy Matching SSIS Packages
Other Relevant Experience
- Oracle Unified Methodology Level 3 Certification
- SEI/CMMI project lifecycle management standards
- DAMA DMBOK and ITIL Practices
- Medical Coding including ICD-9, ICD-10, CPT, DRG, BETOS, HCPCS, and others
- Department of Defense Architecture Framework (DoDAF)
- Top 1% ranking in Kaggle and top 5% finisher in their $3Million Heritage Health Prize Data Mining Competition
- Completed Coursera's May 2013 Data Mining class "with Distinction"
- Maintain websites for http://heartsforkenya.com/, and http://daytonartists.org/
- Guest lecturer at Software Guild, Louisville Makes Games, Louisville Data Science Interest Group, and Microsoft SQL Saturday