About

I am a machine learning research engineer building retrieval, representation-learning, and evaluation systems for scientific discovery.

I’m currently CTO & Co-Founder at Deep MedChem (Prague), where I lead hands-on R&D work across:

large model training + inference pipelines,
scalable vector retrieval of molecules (2D/3D similarity),
evaluation harnesses and benchmarking,
and product-grade scientific software (APIs + UI + deployment).

Recent work spans CHEESE for 3D shape/electrostatic molecular search, synthon-native search and cartography, controlled reasoning benchmarks, and weight-space / neuro-symbolic evaluation audits. In 2026, five of my papers were accepted to ICML workshops across GenBio, WSS, CTB, and CompLearn.

Research interests

representation learning
retrieval / ANN / indexing
evaluation and benchmark design
scientific ML
reasoning / synthetic datasets
product-grade research systems

Selected work

CHEESE — Chemical Embeddings Search Engine (first author)

CHEESE reformulates ligand-based screening with expensive 3D metrics into scalable vector search. It supports 2D fingerprints + 3D shape + 3D electrostatics similarity, and is shipped as a product suite (Search / Explorer / Modeller / Electrostatics).

Public metrics (from the paper mirror + product docs):

Reported up to 10^3 speedup and 10^6 lower cost per query on established benchmark suites over SOTA.
Systems: I built a custom disk-based vector DB indexing 40B+ isometric embeddings
Prospective use: CHEESE electrostatic embeddings prioritized NS1, a tetrazolic neurosteroid that later advanced to in vivo validation in an osteoarthritis pain model.

Links:

Paper landing: /publications/cheese-paper
CHEESE Search: https://cheese.deepmedchem.com
Supplementary repo: https://github.com/Deep-MedChem/cheese-paper
NS1 case study: /blog/cheese-neurosteroid-in-vivo

ICML 2026 workshop papers

Accepted workshop papers from recent evaluation and scientific-ML work:

SynthonGPT (first author)

SynthonGPT is a compact synthon-conditioned transformer for navigating makeable chemical space (grounded in vendor enumerations rather than hallucinated SMILES).

Public metrics (from the report):

Count-matched benchmarks show up to 3.1x higher unique scaffold recovery vs F‑Trees and 1.76x vs SpaceLight while maintaining higher diversity (lower mean similarity).
~90M params, trained in ~10 hours on a single RTX 4090; sub-second inference on CPU/GPU (report).

Links:

Report: https://synthongpt.mireklzicar.com/report.pdf
Demo: https://synthongpt.mireklzicar.com

CellARC (first author)

CellARC is a synthetic benchmark for abstraction/reasoning built from multicolour 1D cellular automata, with reproducible dataset generation, baselines, and a public leaderboard.

Links:

Paper: https://arxiv.org/abs/2511.07908
Repo: https://github.com/mireklzicar/cellarc
Leaderboard: https://cellarc.mireklzicar.com

BitBIRCH-Lean (co-author)

Co-authored BitBIRCH-Lean, a memory-efficient implementation of the BitBIRCH clustering algorithm for very large molecular libraries. I contributed the bit-packing and optimization work that helped make the implementation use 8x less memory while being 2x faster.

BitBIRCH-Lean uses compressed fingerprint representations inside the clustering tree and supports optional C++ acceleration, enabling high-throughput clustering workflows on workstation-scale hardware rather than requiring specialized infrastructure.

Related: paper, GitHub

Experience snapshot

2024 - present

CTO & Co-Founder, Deep MedChem

Foundational models for large-scale molecular search, synthon-native retrieval, evaluation, and deployed scientific software (cloud/on-prem).

2022 - 2024

Research Scientist in Machine Learning, The MAMA AI

R&D; model training; production ML pipelines; entreprise client projects.

2021 - 2022

Machine Learning in Bioinformatics, Biodviser

Neural alignment-free sequence analysis and representation learning.

2021 - 2022

Python Software Developer, Charles University

Built software used by the Central Library.

2018 - 2021

Research internships and freelancing

Scientific computing, data analysis, mathematical methods...

Background

My background combines hands-on systems building with coursework in bioinformatics, computer science, mathematics, and philosophy.

2024

Mathematics, Open University

Coursework in mathematics.

2021 - 2023

Bioinformatics, Charles University

Coursework in computer science, biology, chemistry.

2019 - 2021

Philosophy, Charles University

Coursework in philosophy.