ChemFeat and MolPred¶

Python packages for developing predictors of molecular properties.

Links¶

  • ChemFeat source code
  • ChemFeat on PyPI
  • MolPred source code
  • MolPred on PyPI

Author¶

Jan-Michael (Mike) Rye

  • Inria Research Engineer, SED-LYS, Lyon.
  • Permanent member of AIstroSight.
  • Creator of ChemFeat & MolPred.

ChemFeat¶

A simple Python package and command-line tool for generating feature vectors from molecules

ChemFeat logo
Hydra logo

Usage Overview¶

  • Provide a list of molecules as InChi strings.
  • Select and configure feature sets using a simple YAML configuration file
  • Use ChemFeat to generate a pandas dataframe or CSV file with the calculated features.

Notable Features¶

  • Already supports ~70 feature sets thanks to RDKit and PaDEL Descriptor.
  • Modular design facilitates addition of new feature sets.
  • Calculated features are cached in a database to avoid redundant calculations when rerunning code.

Command-Line Tool¶

Documentation

Example¶

# features.yaml

# QED feature calculator.
- name: qed

# RDK descriptor feature calculator.
- name: rdkdesc
# inchis.csv

InChi,name
"InChI=1S/C8H9NO2/c1-6(10)9-7-2-4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10)","paracetamol"
"InChI=1S/C13H18O2/c1-9(2)8-11-4-6-12(7-5-11)10(3)13(14)15/h4-7,9-10H,8H2,1-3H3,(H,14,15)","ibuprofen"
$ chemfeat calculate features.yaml inchis.csv features.csv
# features.csv

InChi,qed__ALERTS,qed__ALOGP,qed__AROM,qed__HBA,qed__HBD,qed__MW,qed__PSA,qed__ROTB,rdkdesc__FpDensityMorgan1,rdkdesc__FpDensityMorgan2,rdkdesc__FpDensityMorgan3,rdkdesc__MaxAbsPartialCharge,rdkdesc__MaxPartialCharge,rdkdesc__MinAbsPartialCharge,rdkdesc__MinPartialCharge,rdkdesc__NumRadicalElectrons,rdkdesc__NumValenceElectrons
"InChI=1S/C8H9NO2/c1-6(10)9-7-2-4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10)",2,2.0000999999999998,1,2,2,151.16500000000002,52.82000000000001,1,1.2727272727272727,1.8181818181818181,2.272727272727273,0.5079642937129114,0.18214293782620056,0.18214293782620056,-0.5079642937129114,0,58
"InChI=1S/C13H18O2/c1-9(2)8-11-4-6-12(7-5-11)10(3)13(14)15/h4-7,9-10H,8H2,1-3H3,(H,14,15)",0,3.073200000000001,1,2,1,206.28499999999997,37.3,4,1.2,1.7333333333333334,2.1333333333333333,0.4807885019257389,0.3101853515323108,0.3101853515323108,-0.4807885019257389,0,82

Python API¶

Documentation

from chemfeat.database import FeatureDatabase
from chemfeat.features.manager import FeatureManager

# A list of feature specifications, either loaded from a user-provided
# YAML file or set programmatically.
feat_specs = [
    {'name': 'qed'},
    {'name': 'rdkdesc'},
    {'name': 'rdkfp', 'size': 2048}
]

# An iterable of InChi strings, such as a column from a loaded CSV file.
inchis = [
    "InChI=1S/C8H9NO2/c1-6(10)9-7-2-4-8(11)5-3-7/h2-5,11H,1H3,(H,9,10)",
    "InChI=1S/C13H18O2/c1-9(2)8-11-4-6-12(7-5-11)10(3)13(14)15/h4-7,9-10H,8H2,1-3H3,(H,14,15)"
]

# Create the database object. This can point to a common database to centralize
# caching of calculated features.
feat_db = FeatureDatabase('features.sqlite')

# Create the feature manager object.
feat_man = FeatureManager(feat_db, feat_specs)

# Calculate the features and retrieve them as a Pandas dataframe.
feat_dataframe = feat_man.calculate_features(inchis, return_dataframe=True)

ChemFeat Conclusion¶

  • Already supports a wide range of feature sets.
  • Easy to add new feature sets.
  • Simple command-line tool and Python API.
  • Available on PyPI.

MolPred¶

Combine Hydronaut and ChemFeat to predict molecular properties using ML & DL.

MolPred logo
Hydronaut logo ChemFeat logo

Hydronaut¶

An ML & DL framework for managing and optimizing hyperparameters with Hydra and tracking results with MLflow

Presentations and tutorials.

Recent feature: Python decorators!

MolPred Overview¶

Trains ML & DL models using feature vectors generated by ChemFeat.

Highlights¶

  • All the benefits of Hydronaut (parameterized via YAML configuration files, automatic hyperparameter sweeping and optimization, systematic result tracking with MLFlow.
  • All of the benefits of ChemFeat (large number of features, easily extensible to add more feature sets).
  • Automatic visualization of numeric and categoric features. Examples
  • Easy to define user models and scorers.

Basic Usage¶

  1. Define a model as a subclass of ModelBase.
    • ModelBase subclass example
  2. Modify the configuration file template.
    • Configuration file example
  3. Provide a CSV file with InChis, prediction targets and optional additional features.
  4. Run hydronaut-run.

MolPred Conclusion¶

  • Relatively simple to adapt a model to the framework.
  • Everything can be configured with YAML (model, feature sets, various metrics).
  • Easy to explore & optimize hyperparameters thanks to Hydronaut (test different models, feature sets, etc.).
  • Automatic visualization of features.
  • All runs are tracked (parameters, metrics, models, artifacts, plots, code version, etc.)
  • Retrieve models via MLflow for testing and prediction.

The End¶

  • Back to homepage