The project provides tools for analyzing molecules based on physical and chemical equations.
The project currently includes the following tools:
The programs currently support molcules containing hydrogen*, carbon, oxygen, nitrogen, fluorine, and chlorine atoms.
*Note: Hydrogen atoms will be neglected in Structure Finder and MIS Calculator. Hydrogen atoms are of little significance in structure determination. Also, in isotopic substitutions, H/D substitutions are much more unpredictable than other substitutions because of the magnification of rovibrational effect on small masses.
The code is written on Swift 5.1, thus any compilation should be performed on the compiler of Swift 5.1 or newer versions. The executables should be working on environments that has Swift 5.1 installed.
System Version | Swift Version | Status |
---|---|---|
macOS 10.14.5 | Swift 5.1 | Verified |
macOS 10.15 beta | Swift 5.1 | Verified |
Ubuntu 18.04.2 LTS | Swift 5.1 | Verified |
Ubuntu 18.04 (WSL) | Swift 5.1 | Verified |
macOS 10.14.5 | Swift 5.0.1 | Unable to compile* |
*For Swift 5.0.1, the program is not able to compile, but the exectuables are able to run on Swift 5.0.1.
To learn how to install Swift, please visit here. In the "Snapshots" section, select Swift 5.1 Development. Windows 10 users may install Swift on Windows Subsytem of Linux (WSL).
The module JYMTAdvancedKit
for advanced calculations utilizes PythonKit
as dependency, which is hosted on GitHub. Therefore, for direct running through swift run
or compiling through swift build
, an internet connection may be required to fetch the external SPM dependencies.
To compile all the executables with swift package manager, use
swift build -c release; mv ./.build/release/JYMT-* ./; rm -rf ./*.product
and run any executable by
./JYMT-[Tool Name]
You'll be able to find how to compile and run for each specific tool in the sections below.
Some molecule models and results produced from this set of tool can be founded on this repository. The results will be updated routinely to match the latest version of the tools.
Structure Finder provides the ability to calculate the possible structures of a molecule from the known absolute values (or sign-undetermined values) of positions |x|, |y|, and |z| for each atom in the molecule. The latter data can be obtained via the single isotope substitution based on Kraitchman's equations.
Note: the program ignores hydrogen atoms in the calculation, and the hydrogen atoms are not included in the output results.
The tool uses library JYMTBasicKit
. JYMTAdvancedKit
will be also used in the next version (0.4) for the calculation of SMILES data through RDKit.
swift run -c release JYMT-StructureFinder
swift build --product JYMT-StructureFinder -c release; mv ./.build/release/JYMT-StructureFinder JYMT-StructureFinder
and run the executable by
./JYMT-StructureFinder
Note: Make sure the environment installed Swift 5.1. The Swift 5.0 compiler won't compile as there will be some errors.
Test mode is available for use in version 0.1.3 or later. Enter the test mode by passing the command line argument -t
. For example, you can run the exeuctable and enter the test mode by
./JYMT-StructureFinder -t
Use test mode to test whether a known molecule will pass all the filters (with given parameters listed here) or not. In the test mode, the program will not re-sign the coordinates.
Simple mode is available for use in version 0.1.4 or later. Enable the simple mode by passing the command line argument -s
. In simple mode, all the parameters will be set to default values and you will not be promopted to enter them. You still need to identify the input xyz file path and optional exporting path.
You can use the simple mode and test mode at the same time by passing the commmand line arguments -s -t
or -t -s
.
The tool takes a .xyz
file as input for the known absolute values (or sign-undetermined values) of positions |x|, |y|, and |z| for each atom in the molecule with unit in angstrom. The sign of each value can be incorrect, but their value (absolute value) must be matched (within allowed uncertainties) to the ultimate correct structure. The .xyz
file looks like below.
4
C -5.43 2.04 -0.14
O -3.44 3.38 -0.19
C -3.91 2.04 -0.11
C -3.37 1.4 1.16
The tool will print the output to the console, and there is an option to save the results as .xyz
files (which contains only coordinate information) and .mol
files (which contains both coordinate information and bond information*), and the log as .txt
file.
You can visualize the .xyz
files and .mol
files with softwares like Avogadro.
*Note: For This problem appeared to be solved in version 0.1.2, or after git commit 66cea02, while the actual results are still to be verified..mol
files, one (or more) bonds might be missing if some atoms in the molecule form a closed structure (for example, benzene ring). This problem will be fixed in future versions.
(bondLengthRange.min - tolLevel, bondLengthRange.max + tolLevel)
.(typBondLength - tolLevel, typBondLength + tolLevel)
. The default value in these versions was 0.1.As tested on computation-capable platforms, for molecules containing no more than 20 non-hydrogen atoms, the program is able to complete the computation in a reasonable amount of time (mostly less 10 minutes). Some computation time and number of results are listed for reference below (tested with CPU Ryzen 7 5800X with 64GB of RAM, under commit 38b219c, on Ubuntu 20.04 LTS).
Molecule | Non-H Atoms | Structures in Result | Bond Graphs in Result | Lewis Structures in Result | Computation Time (s) |
---|---|---|---|---|---|
1,2-propanediol | 5 | 32 | 32 | 1 | 0.0125 |
Benzene | 6 | 12 | 216 | between 4 and 126 | 0.0571 |
Alpha Pinene | 10 | 132 | 148 | between 7 and 23 | 0.4187 |
Aspirin | 13 | 124 | 618 | between 19 and 120 | 8.666 |
Branched laurylphenol | 17 | 1008 | 9534 | between 6 and 140 | 599.8 |
Isomorphine | 21 | 10638 | 34267 | between 291 and 12929 | 6763.0 |
Monoacetyl-alpha-isomorphine | 24 | 7980 | 36980 | between 116 and 6253 | 37827 |
Simvastatin | 30 | 5216 | 7808 | between 43 and 599 | 45232 |
Vitamin E | 31 | 1976 | 7432 | Between 20 and 1344 | 29188 |
Computation time and number of results are listed for reference below (tested with CPU i7-8700B with 32GB of RAM, under commit 0075521, on macOS 10.14.5).*
Molecule | Non-H Atoms | Structures in Result | Bond Graphs in Result | Lewis Structures in Result | Computation Time (s) |
---|---|---|---|---|---|
1,2-propanediol | 5 | 32 | 32 | 1 | 0.0139 |
Benzene | 6 | 12 | 216 | between 4 and 108 | 0.0879 |
Alpha Pinene | 10 | 132 | 148 | between 7 and 23 | 0.8936 |
Aspirin | 13 | 124 | 618 | between 19 and 120 | 18.382 |
Branched laurylphenol | 17 | 1008 | 9534 | between 6 and 140 | 1252.1 |
Isomorphine | 21 | 10638 | 34267 | between 291 and 12929 | 14713 |
Monoacetyl-alpha-isomorphine | 24 | 7980 | 36980 | between 116 and 6253 | 84726 |
*Detailed results may be found here.
As the first atom is arbitrarily fixed, the total number of structural combinations for k non-hygrogen atoms should be 8k-1. After optimization in algorithms, the runtime complexity of the program should be O(n logn), where n = 8k-1. Therefore, in terms of k, the runtime complexity is basically 2O(k), which grows exponentially with the increase of number of non-H atoms.
According to the tests, the program is able to complete most of the computations for molecules containing no more than 20 non-hydrogen atoms in less than 10 minutes. The limit is extended to around 23 if the computation time is allowed to be less than one day. Under current test, the upper limit of the number of non-hydrogen atoms in the molecules is 24, which takes over 36 hours (a day and a half) to complete the computation. Also note that an extensive amount of memory is needed for computations of large molecules (20+ non-H atoms).
This is a tool for implementing Kraitchman's equations (J. Kraitchman, Am. J. Phys., 21, 17 (1953)) to find the absolute values of the position vector (components) of each atoms in the molecule. The program takes data of A,B,C (rotational constants) of the original molecule and the ones after single isotopic substitution.
The tool uses library JYMTBasicKit
.
swift run -c release JYMT-ABCTool
swift build --product JYMT-ABCTool -c release; mv ./.build/release/JYMT-ABCTool JYMT-ABCTool
and run the executable by
./JYMT-ABCTool
The tool takes a .sabc
plain-text file as input for the rotational constants and the total mass of the original molecule, and the rotational constants and the substituted atom for each single isotopic substitution. The .sabc
file looks like below.
10696.0950 4051.0323 2994.6632 76.051
Comment line
10695.7310 4043.1721 2990.3393 13 C
10565.6850 4033.4314 2974.8143 13 C
10517.4110 3888.3322 2891.4293 18 O
10018.3875 4035.1737 2930.5332 18 O
10695.7526 3834.7553 2874.8073 18 O
(Source: Hasegawa, Hiroshi, Osamu Ohashi, and Ichiro Yamaguchi. "Microwave spectrum and conformation of glycolic acid." Journal of Molecular Structure 82.3-4 (1982): 205-211.)
13
for carbon-13), and the substituted element. Each block of information is separated by blank spaces.(Note: The actual file extension does not need to be .sabc
. However, the format must be correct for the tool to work.)
The tool will print the output to the console, and there is an option to save the results as an .xyz
file.
See discussion of the imaginary coordinate issue here.
ABC Calculator is a tool to calculate the rotational constants A, B, and C from the structural information (XYZ). It is basically the inverse process of ABC Tool.
This tool utilizes JYMTAdvancedKit
, which depends on the interoperability bewteen Swift and Python to utilize the NumPy library to calculate the advanced matrix linear algebra.
Single/multiple isotopic substitutions are also calculated (including hydrogen atoms) based on the structural information. The program will assume the most common isotopologue as the parent molecule, and use the second most common isotope for each element in isotopic substitutions. This tool uses the same calculation module for isotopic substitutions as MIS Calculator.
The tool uses libraries JYMTBasicKit
and JYMTAdvancedKit
.
*Note: The tool also used NumPy
library with Python 3. Thus Python 3 along with NumPy
are required to be installed in the environment.
swift run -c release JYMT-ABCCalculator
swift build --product JYMT-ABCCalculator -c release; mv ./.build/release/JYMT-ABCCalculator JYMT-ABCCalculator
and run the executable by
./JYMT-ABCCalculator
Note: Make sure the environment installed Swift 5.1. The Swift 5.0 compiler won't compile as there will be some errors.
The tool takes a .xyz
file as input for the known absolute values (or uncertain-signed values) of positions |x|, |y|, and |z| for each atom in the molecule with unit in angstrom. The sign of each value must be correct because the tool directly takes the Cartesian coordinates information in the file as the actual structural information. The .xyz
file looks like below.
4
C -5.43 2.04 -0.14
O -3.44 3.38 -0.19
C -3.91 2.04 -0.11
C -3.37 1.4 1.16
Note: If the molecule has hydrogen atoms in the structure, then the correct Cartesian coordinates information of hydrogen atoms must be included in the xyz file.
The tool will directly print the output to the console, and there is an option to save the results as a .txt
file.. The output contains the calculated rotational constants with unit in megahertz (MHz).
3
, then the program will perform single, double, and triple isotopic substitutions.0
, then no substitutions will be performed.MIS Calculator is tool to calculate the rotational constants information for multiple isotopic substitutions. The data comes from single isotopic substitutions (sabc
file), while .xyz
, .mol
are planned to be added in the future.
The program predicts the rotational constants under multiple isotopic substitutions from the given single isotopic substitution information (or data of the molecular structure). The outcomes are not expected to be unique if the structural information is not determined in the data source (for example, from sabc
or un-signed .xyz
files), but the program should perform as well as Structure Finder in terms of reduction efficiency.
Use the same module as ABC Tool, the program implements Kraitchman's equations (J. Kraitchman, Am. J. Phys., 21, 17 (1953)) to find the absolute values of the position vector (components) of each atoms in the molecule.
In principle, this tool is a convenient combination of ABC Tool, Structure Finder, and ABC Calculator in series. It reflects a typical lab workflow that utilizes this set of tools.
The program will utilize JYMTAdvancedKit
, which depends on the interoperability bewteen Swift and Python to utilize the NumPy
library to calculate the advanced matrix linear algebra.
This tool is still in early development. Calculation might not reflect the accurate scenario and several researches of physical theories behind the program are in progress to optimize the results.
The tool uses libraries JYMTBasicKit
and JYMTAdvancedKit
.
*Note: The tool also used NumPy
library with Python 3. Thus Python 3 along with NumPy
are required to be installed in the environment.
swift run -c release JYMT-MISCalculator
swift build --product JYMT-MISCalculator -c release; mv ./.build/release/JYMT-MISCalculator JYMT-MISCalculator
and run the executable by
./JYMT-MISCalculator
The tool takes a .sabc
plain-text file as input for the rotational constants and the total mass of the original molecule, and the rotational constants and the substituted atom for each single isotopic substitution. The .sabc
file looks like below.
10696.0950 4051.0323 2994.6632 76.051
Comment line
10695.7310 4043.1721 2990.3393 13 C
10565.6850 4033.4314 2974.8143 13 C
10517.4110 3888.3322 2891.4293 18 O
10018.3875 4035.1737 2930.5332 18 O
10695.7526 3834.7553 2874.8073 18 O
(Source: Hasegawa, Hiroshi, Osamu Ohashi, and Ichiro Yamaguchi. "Microwave spectrum and conformation of glycolic acid." Journal of Molecular Structure 82.3-4 (1982): 205-211.)
13
for carbon-13), and the substituted element. Each block of information is separated by blank spaces.(Note: The actual file extension does not need to be .sabc
. However, the format must be correct for the tool to work.)
The tool will print the output to the console, and there is an option to save the results as a .txt
file.
3
, then the program will perform single, double, and triple isotopic substitutions.There is a known problem (both in MIS Calculator and ABC Tool, as they rely on the same algorithm) that a square root operation on negative numbers might occur when implementing Kraitchman's equations when the input data is not perfectly accurate. When this problem raises, the program will print the following message to the console (as an example)
WARNING: Imaginary coordinate 0.0161i appeared. Rounded to zero. (ABC dev: 7.63kHz)
and round the corresponding coordinate to zero. The ABC deviation provides the information that how large the error can be in the input rotational constants to ''make'' the coordinate zero.
If the imaginary number is small, then the problem is not significantly serious in structure determination (when use with Structure Finder). However, it would be serious in predicting isotopic substitutions as the rotational constants and corresponding parameters are extraordinarily sensitive to the accuracy of the data source. The single substitution might be unmatched between the input data and the re-constructed data from the program since the program rounds the imaginary coordinates to zero. For example, for the following parent molecule and single isotopic substitution
[Parent Molecule]
PM A: 8572.0553 B: 3640.1063 C: 2790.9666 Mass: 76.09
[Single Isotopic Substitution]
...
C2 A: 8555.9200 B: 3631.1660 C: 2787.5640 Isotope: 13
...
the following warning will be raised during the calculation
WARNING: Imaginary coordinate 0.0697i appeared. Rounded to zero. (ABC dev: 252.13kHz)
and the following coordinates with be passed to the later calculations after structure filtering
...
C2 [-0.47733, 0.00000, -0.34254]
...
As observed, the rounded coordinates don't reflect the actual position of the atom in the molecule because when the program reconstructs the single isotopic substitution, it yields a different result than the input source:
...
C2 A: 8555.225504 B: 3631.165991 C: 2787.489864
...
In practical, this problem is worrying because it makes the predictions from the tool less reliable even there are no imagninary coordinates presented. Certain physical theories underlying the program might cause this "imgainary self-contradiction" as some effects including vibrations and centrifugal distortions are not fully considered in the calculations. Deeper reseraches are in progress in attempt to solve the problem, or, at least to minimize the error presented in this problem.
As of the current reserach goes, it was found that this problem of imaginary coordinates were common and often happened when the atom was near the principal axes or the principal plane of the molecule. The root of this problem comes from the neglection of the change of the rovibrational effect on the molecule before/after the isotopic substitutions. This was the assumption of Kraitchman's equations, which is the underlying theory of the algorithm used in this program. The structure derived from Kraitchman's equation, which is usually named rs, is one of the most popular derived data in rotational spectroscopy because it requires less amount of experimental data and gives a relatively accurate estimation of the structure. However, the reliability of rs decreases as the atom becomes closer to the principal axes or the principal plane, which is the major issue presented here.
Other popular alternatives including rm, rc, rmρ are usually used to take the change in rovibrational effect and inertial defects into account. But these parameters require a massive amount of data (for example, rm requires a complete set of single isotopic substitutions or more) which are unrealistic in the actual lab environment, especially for large molecules. Therefore, we must look for an intermediate parameter bewteen rs and rm that requires less amount of data than rm, but provides a more accurate estimate of structure than rs. More researches on this topic are in progress.
MIS Tool is a tool to deduce the relative positions of the atoms in a molecule based on single substitution data and mutual double substitution data. It is still under early development.
This set of tools are affliated with Patterson Group at University of California, Santa Barbara and built with assistance from Professor Dave Patterson. Our appreciation extends to the colleagues for help to build this set of tools. Special thanks to Larry Li for assistance in runtime optimization when writing the Structure Finder.
(In no particular order)
link |
Stars: 1 |
Last commit: 2 years ago |
Note: This release is non-production ready. The software is subject to change, and results yielded from the pre-released software may be inaccurate.
This is the fourth alpha release of JYMolecule-Swift. A scoring system of Structure Finder has been added to this pre-release. The scoring system is intended to provide more flexibility to the determination process of molecular structures. Note that the scoring system might lead to the runtime to be 10%-20% longer (depending on the molecules).
This release includes the following tools:
The executables should be run on environments with Swift 5.1 installed. The executables might be compatible with Swift 5.0 or earlier version, but the source code is not compilable via Swift 5.0.
For the other systems, if the executables don't work, you may try compiling the source code from the system itself.
Swiftpack is being maintained by Petr Pavlik | @ptrpavlik | @swiftpackco | API | Analytics