Details
Posted: 14-May-22
Location: Boston, Massachusetts
Salary: Unpaid
Internal Number: 3193040-3499_1652389959
Home Base, a Red Sox Foundation and Massachusetts General Hospital program, is an innovative public/private partnership dedicated to improving the lives of service members, Veterans, and their Family Members living with invisible wounds such as post-traumatic stress disorder (PTSD), traumatic brain injury (TBI), anxiety, depression, military sexual trauma, and family/relationship challenges. Our program provides intensive outpatient care, traditional outpatient care, couples therapy, wellness, and fitness programs, as well as community outreach and education. Home Base also serves as a leader in research, identifying and implementing new treatments for the invisible wounds. Central to our mission is a value of inclusivity and equity. We strive to create an environment in which colleagues and patients are seen, heard, and treated with dignity and respect. Since its inception in 2009, Home Base has served more than 25,000 Veterans and their family members, trained more than 85,000 clinicians nationally, and remains at the forefront of discovering new treatments, ensuring a brighter future for 21st century warriors and their families.
Home Base seeks to hire a data engineer to start April 2022, who will support a variety of database/warehouse management, ETL scripting, and data validation tasks that include but are not limited to querying databases, restructuring data, cleaning, and validating data, performing manual ETL tasks, automating ETL tasks using tools and custom scripting, full pipeline management/monitoring, improving systems and processes, and documenting data systems. The qualified candidate will be highly detail-oriented and have a strong interest in and aptitude for data management and engineering. Some specific focus areas would be determined based on the candidate's skills and interests.
The successful candidate must be highly organized, motivated, and able to thrive in a fast-paced team environment and must enjoy the challenge of a dynamic environment with evolving needs. It is extremely important that the candidate possess the ability to carefully keep track of multiple work streams.
Qualifications PRINCIPAL DUTIES AND RESPONSIBILITIES:Relevant activities include, but are not limited to the following:
- Achieving an extremely detailed understanding of our current data ecosystem, including its structure, data meaning, history, flow/processing, and challenges
- Utilizing, improving, and constructing and ETL tools and data warehousing solutions
- Running current SQL, Python, PHP, and/or Tableau Prep ETL scripts
- Using various monitoring and evaluation methods to validate that data flowing through these pipelines is accurate and troubleshooting/addressing issues when they are discovered
- Data warehouse maintenance and support
- Improving and better integrating scripts (ETL and validation) and warehouse elements into various data pipelines to achieve greater efficiency, reliability, and functionality
- Constructing new ETL tools and warehouse components as necessary, specifically including a dedicated-use pipeline for a new collaborative research project
- Data Cleaning
- Writing queries (SQL) and scripts (Python) to identify data quality problems
- Investigating the root cause of data quality problems
- Working with appropriate team members to determine appropriate data remediation and process improvement plans
- Developing queries and scripts as needed to repair data in bulk
- Developing and managing data quality and infrastructure monitoring dashboards
- Additional Responsibilities
- Supporting the team as needed with data querying (particularly of the data warehouse), processing, analysis and reporting for both regular and ad-hoc requests from clinical, executive, and external audiences
- Researching potential new data engineering solutions, analyze feasibility, and assist technical leadership in road-mapping and designing the evolution of our data infrastructure
- Creating and maintain documentation across our data ecosystem
SKILLS & COMPETENCIES REQUIRED:- Background
- Degree in Health Informatics, Computer Science, Statistics, Mathematics, Engineering, or a similar field
- Familiarity with behavioral health clinical practice and/or research preferred
- Technical
- Procedural programming for data manipulation using Python, NumPy, and Pandas
- PHP, Java, or other languages are a plus
- Knowledge of relational database platforms, data modeling, and warehousing
- Comfortable extracting data from and loading data into sources ranging from an Enterprise Data Warehouse to an Excel or text file, using built-in tools or custom-written ETL scripts
- Knowledge of data aggregation and transformation processes (e.g., pivot, merge, union, hierarchical grouping, aggregation functions)
- Above average SQL skills (e.g., familiar with subqueries, multiple joins, and grouping), specifically MySQL. SQL Server experience a plus
- Comfortable with complex multi-stage, multi-technology ETL pipelines
- Comfortable using APIs to transmit data in both an ad-hoc and automated manner
- Familiar with concepts/tools of Data Quality Management as well as Data Governance practices
- Professional
- Ability to interpret and follow-through on data requirements and with strong attention to detail
- Strength in independently validating and debugging code and analyses, including consulting documentation, Stack Exchange, etc.
- Demonstrates personal initiative and time management skills, as well as the ability to work effectively and kindly as part of a team
- Excellent verbal and written communication skills
- Familiar with agile software development methodologies
- Interest in identifying process improvement opportunities is a plus
LICENSES, CERTIFICATIONS, and/or REGISTRATIONS:- Required: Undergraduate degree in Health Informatics, Computer Science, Statistics, Mathematics, Engineering, or a related subject.
- Preferred: Graduate degree in one of the above.
Preferred coursework would include most of the following:
- Intermediate Databases and SQL
- Intermediate Programming (Procedural and/or OO)
- Data Structures and Algorithms
- Data Quality Management
- Data Flow and Automation
- Agile Project Management
Equivalent Experience - Equivalent time and aptitude achieved through work experience may substitute for some of the preferred courses listed above.
EXPERIENCE:Preferred: 2+ years of experience in data management in a healthcare/clinical setting, however recent or anticipated college graduates will be considered.
WORKING CONDITIONS:Shared spaces; Open work setting model. Hybrid remote schedule
EEO Statement Massachusetts General Hospital is an Equal Opportunity Employer. By embracing diverse skills, perspectives and ideas, we choose to lead. Applications from protected veterans and individuals with disabilities are strongly encouraged.Partner's Healthcare is acting as an Employment Agency in relation to this vacancy.