About

I am a builder who became a machine learning research engineer.

I’m currently CTO & Co-Founder at Deep MedChem (Prague), where I lead hands-on R&D work across:

  • large model training + inference pipelines,
  • scalable vector retrieval of molecules (2D/3D similarity),
  • evaluation harnesses and benchmarking,
  • and product-grade scientific software (APIs + UI + deployment).

Selected work

CHEESE — Chemical Embeddings Search Engine (first author)

CHEESE reformulates ligand-based screening with expensive 3D metrics into scalable vector search. It supports 2D fingerprints + 3D shape + 3D electrostatics similarity, and is shipped as a product suite (Search / Explorer / Modeller / Electrostatics).

Public metrics (from the paper mirror + product docs):

  • Reported up to 10^3 speedup and 10^6 lower cost per query on established benchmark suites over SOTA.
  • Systems: I built a custom disk-based vector DB indexing 40B+ isometric embeddings

Links:

SynthonGPT (first author)

SynthonGPT is a compact synthon-conditioned transformer for navigating makeable chemical space (grounded in vendor enumerations rather than hallucinated SMILES).

Public metrics (from the report):

  • Count-matched benchmarks show up to 3.1x higher unique scaffold recovery vs F‑Trees and 1.76x vs SpaceLight while maintaining higher diversity (lower mean similarity).
  • ~90M params, trained in ~10 hours on a single RTX 4090; sub-second inference on CPU/GPU (report).

Links:

CellARC (first author)

CellARC is a synthetic benchmark for abstraction/reasoning built from multicolour 1D cellular automata, with reproducible dataset generation, baselines, and a public leaderboard.

Links:

BitBIRCH-Lean (co-author)

Co-authored BitBIRCH-Lean, a memory-efficient implementation of the BitBIRCH clustering algorithm for very large molecular libraries. I contributed the bit-packing and optimization work that helped make the implementation use 8x less memory while being 2x faster.

BitBIRCH-Lean uses compressed fingerprint representations inside the clustering tree and supports optional C++ acceleration, enabling high-throughput clustering workflows on workstation-scale hardware rather than requiring specialized infrastructure.

Related: paper, GitHub

Experience snapshot

2024 - present

CTO & Co-Founder, Deep MedChem

Foundational models for large-scale molecular search, evaluation, and deployed scientific software (cloud/on‑prem).

2022 - 2024

Research Scientist in Machine Learning, The MAMA AI

R&D; model training; production ML pipelines; entreprise client projects.

2021 - 2022

Machine Learning in Bioinformatics, Biodviser

Neural alignment-free sequence analysis and representation learning.

2021 - 2022

Python Software Developer, Charles University

Built software used by the Central Library.

2018 - 2021

Research internships and freelancing

Scientific computing, data analysis, mathematical methods...

Background

I grew into research through building and shipping systems from early age, and most of my formation happened in real scientific and engineering settings rather than through a conventional academic ladder.

2021 - 2023

Bioinformatics, Charles University

Coursework in computer science, mathematics, biology, chemistry.

2019 - 2021

Philosophy, Charles University

Earlier coursework.