Data Engineer
Allozymes
Allozymes is a deep tech company based in Singapore. We are revolutionising the way industry uses enzymes for manufacturing chemicals and natural compounds. Our rapid discovery and evolution of custom-designed enzymes enables breakthrough developments for sustainable production of ingredients for pharmaceuticals, cosmetics, chemical, food and beverages.
We’re hiring a highly capable data engineer into our Data team. This team is responsible for developing and implementing state-of-the-art approaches for evolving enzymes and microbes to enhance the production of chemicals and natural compounds. The engineer will integrate the development of Allozymes’ cloud-based data infrastructure and work on upgrading our enzyme and strain optimization platform as well as collaborate in the development of customer ready solutions. Working in a highly collaborative and dynamic environment, this role has the opportunity to interact with other scientists, automation and process engineers to achieve these goals
Responsibilities
- Contribute to the design and improvement of Allozymes cloud-based data infrastructure.
- Develop, implement and deploy cloud-based data analysis pipelines with a variety of data science, cloud computing and analytics methods.
- Create automated and reliable data ETL pipelines
- Manage, store and process proprietary datasets and data lakes to find opportunities for process optimization.
- Apply predictive models to test the effectiveness of different courses of action.
- Collaborate in the development and deployment of custom UI solutions.
Requirements & Qualifications
- MS or PhD in statistics, mathematics, applied mathematics, computer science, bioinformatics or a related quantitative field with a focus on data science and data engineering.
- 2+ years of relevant industry experience.
- Proficient in Python.
- Experience with cloud systems providers, such as AWS, and cloud based warehouses
- Experience with SQL/NoSQL.
- Expertise in using python libraries such as Pandas, Numpy, Scipy, Matplotlib, Seaborn, Scikit-learn.
- Strong experience in handling big-data, extrapolating information from data, building and deploying data science/ETL pipelines.
- Ability to work in a fast-paced, collaborative and cross-functional environment and communicate results effectively to management.
- Experience building RESTful APIs (Django, FastAPI, etc.) is a plus
- Experience with front-end development is a plus
- Knowledge and experience in statistical and data mining techniques is a plus
- Experience in biology, bioinformatics or computational biology is a plus
- Experience with code version controlling and platforms such as github/gitlab is a plus
- Experience with CI/CD processes is a plus
- Experience with containers and container orchestration tools is a plus
- Experience writing production-ready code (version-controlled, scalable, well-documented, testable, deployment-ready is a plus
Candidates who fulfill the above-mentioned criteria are encouraged to Apply here