For the general public, the term “chemistry” might probably bring the picture of a person, in a white color lab-coat, working with some glass beakers filled with colorful solutions. This picture is mostly true for describing a 19th entury chemist. The present-day chemist works with far more advanced equipment than mere glass beakers and deals with quite complex processes rather than just mixing the solutions! To study a reaction, today’s chemist routinely uses various kinds of instruments such as spectroscopes/spectrometers (UV, IR, NMR, EPR, XRD, Mass spectrometer, etc.), microscopes (Scanning Electron Microscope, Transmission Electron Microscope, Scanning Tunneling Microscope, Atomic Force Microscope, etc.) and many others. Obviously, not every chemist uses all of these instruments nor does every chemical reaction require all of them. The specific instrument which a chemist uses regularly is determined by the experiments he/she is conducting, and thus, is completely field specific. So, at a first glance, it might seem that there is no other new general instrument/material (like a lab-coat or a glass beaker) which connects all of the new generation chemists. However, if we look a little deeply, we will immediately find that every new generation chemist is connected to one specific instrument - a computer. This is obviously true from the technological viewpoint because all the new generation chemists operate (or at least, interact with) most of their instruments through a computer, they plot their results using various computer software, they write their research articles on it, etc. But, as mentioned above, this is just a technological connection. Through this article, I will introduce you to a much stronger and a direct connection between the chemists and computers which lead to the development of an entirely new branch of chemistry where chemical reactions are performed inside a computer!
Science, in general, and chemistry, in particular, have seen exponential growth in the last two centuries. Due to this phenomenal growth, the very tree of the chemistry grew many branches. Some of the branches, such as organic chemistry and analytical chemistry, are quite well-known to the general public of India, majorly, due to the notable rise in the job market (in the last few decades) for these fields of chemistry. They might also be popular due to their applications in the petroleum, pharmaceutical, and other industries. On the other hand, there are quite a few important branches of the chemistry, which have revolutionized the field and changed the workflow of various industries (including the pharmaceutical, petroleum and energy) but are completely alien even to the students who are pursuing their bachelors or masters degrees in the field of chemistry. The aim of this article is to introduce you to one of these revolutionary branches of the chemistry, namely, computational chemistry.
The branch, computational chemistry (CC) by itself is vast and it can be used to study all forms of matter, namely, solid, liquid, gas, and plasma. Also, it can be applied to study the matter in both static and dynamic situations. For example, CC can be used to study the properties of diamonds at very low temperatures (static) or to study the reaction of a molecule colliding with a surface (dynamic). In general, it is used as a complementary tool to experiments to explain observations, such as, why the absorption spectrum of a compound is red-shifted with a particular substituent? OR, why a protein is binding to a specific antibody? OR, which path does an electron took while it is transferred from a donor moiety to an acceptor moiety? OR, how laser interaction changed the magnetic properties of a material? OR, how an electron moves on a femtosecond time scale in a Solar cell? Thus, CC is useful to understand the microscopic details of any system under study. Till the last decade, CC is often employed as a complementary tool to experiments to probe the specific details of a system. However, with drastic improvements in the computing power and with the development of accurate and efficient computational techniques, the role of CC is slowly changing from being a complementary tool to a completely individual and a more robust predictive too also.
During the past eight years or so, there has been a lot of research where computation alone was employed to predict many new materials with interesting electronic, magnetic and optical properties (for example, many new layered materials were predicted through high-throughput screening) and many of these predictions were later confirmed by the experimentalists. Furthermore, CC allows us to effortlessly study materials at extreme conditions such as very high/low temperatures and pressures. Even obtaining/maintaining such harsh conditions itself is a daunting task to experimentalists and there is no need to emphasize the challenges pertaining to conduct experiments at such extreme conditions. Due to this advantage, CC was able to predict various interesting metastable materials such as TiN2 (a new superhard material), Ti3N4 (only known semiconducting titanium nitride) and many more which were only synthesized recently by the experimentalists (but were predicted as early as 2003). Despite the huge and ever-increasing success of CC in predicting new interesting materials, there are certain grey areas where CC’s predictions can go wrong (for example, with strongly correlated systems such as oxides of certain d- and f-block elements). However, it is also well-known that these shortcomings are mainly due to the inadequacies of the employed theory. Before going more deeply, let us first try to understand how CC works.
Computational chemistry is based on the theoretical principles of many other important fields such as quantum mechanics, molecular mechanics, statistical mechanics, special theory of relativity and solid state chemistry/physics. Here, theoretical principles mean the set of postulates and equations on which the theory is built. For example, in the case of the special theory of relativity, one of the postulates is the invariance of the speed of light, c, and one of the equations describing the theory is E = mc2.
Once we know the equations governing a theory, we can use a computer to solve them numerically. Thus, basically, computational chemistry (or for that matter, even computational physics or computational biology, etc.) is all about solving the equations governing the chemistry of a system using a computer. Depending on the equations which we solve, we will able to know the specific properties of a system. For example, if we solve the Schrödinger equation, we will be able to understand the quantum chemical properties of a system; solving the Newtonian equations of motion will help us to describe the classical motion of nuclei in a system; relativistic aspects of a system are known by solving the Dirac equation and so on. Thus, once we know the solutions of these equations (namely, a wave function in quantum mechanics; partition function in statistical mechanics, etc.), we are bound to know all the physical properties of the system. So, when we say a computational chemist performs reactions in a computer, what we actually mean is he/she is simulating a chemical environment (for example, they generate the structure of the compound) and find the solutions to some of the above equations (Which equations they solve for depends on what properties of the system they are interested in).
It is easy to understand the value/usefulness of chemical simulations by bringing a simple analogy. Let us consider the question of what is your weight on Pluto? Now, tell me the way you would like to answer this question. Would you like to travel to Pluto to weigh yourself or would you use a formula to compute your weight by sitting on the Earth (weight on a planet = mass of the body * surface gravity of the planet)? Obviously, both ways will give you the right answer. But, the latter one is both quick and an efficient way to answer the question (I can easily ask the same question to thousands of people and I can get the answer quite quickly. In the former case, I need to send each one of them to Pluto to know their weight.). Essentially, a computational chemist follows the latter path and many times (not always!), computational chemistry is a cheap, quick and environmentally friendly way to answer the question at hand.
Let us consider another example in the context of chemistry to further appreciate it. Let us take a simple study where you would like to understand how the acidity of an organic compound changes by adding a substituent. If you need to understand this by performing experiments, you may need to consider the parent compound (say, phenol) in some bulk amount and you need to change the substituent on the parent compound using some substitution reactions. For each type of substituent (say, -CH3, -NH2, -NO2, etc.), you need to generally go through different reaction procedures to yield the required products. By this time, you might have consumed a lot of chemicals such as solvents to purify/filter each of the substituted compounds. After getting each of the substituted products, you may need to run different kinds of other experiments (such as a litmus test or acid-base titrations, etc.) to understand their properties.
Some of these techniques are destructive (in the sense that your sample characteristics will be altered after performing the experiment, and thus, your sample can’t be re-used). Due to this, you may be required to prepare enough amount of substituted-compounds (and also the parent compound) to test the accuracy of your results. All the above process, clearly emphasizes a point that to know the order of the electron withdrawing nature of these substituents, you need to perform a lot of experiments and, in the process, you will be consuming a lot of real chemicals (while preparing the compounds as well as while testing!). Some of the chemicals can even be hazardous to humans and to the environment.
On the other hand, if we decide to find the answer using a computer, we would be first drawing each of these structures using some software package (much like how we draw different structures on a paper) and then we solve the Schrödinger equation to find the wavefunction for each of these structures. Knowing the wavefunction, we can calculate the binding energy of the proton (or any other properties of our interest) for each one of these systems and then we can predict their acidity. Once again, clearly, this is a quick, cheap and environmentally friendly way to know the answer to our question. The comfort of performing an experiment inside a computer can further be appreciated by imagining what an experimentalist had to go through if we want to know the same electron withdrawing behavior of these compounds at extreme temperatures or pressures. For a computational chemist, it is just changing the numbers in his program!
From the above discussion, the importance of performing calculations to predict the chemical nature of a system should be apparent to the reader. Indeed, due to the cost-effectiveness of using CC, various industries have been employing CC for different purposes. For example, pharmaceutical industries use CC to screen many compounds to find the best possible drug molecules to bind a particular protein pocket or they screen several polymorphs of a particular drug molecule to find the most stable polymorphs at room temperature. These screening procedures help the industries to decrease the cost of producing a drug by manyfold (which indeed will help to reduce the price at which they can sell the drug (if they wish to)) as the final tests with real experiments now only need to be performed with a very few set of compounds (instead of testing a whole library of compounds). With the recent drastic improvements in artificial intelligence (such as the development of robust machine learning and deep learning algorithms), there is a huge surge in the utilization of CC in the pharmaceutical industry. By combining artificial intelligence with CC, millions of drug molecules are being screened to find the potential candidates for medicinal use. With these trends, we can foresee the paramount role of CC in drug discovery in the very near future. (Have a look at the websites such as Computational Resources for Drug Discovery and Chem Bridge, for example, and try to read more about the subject “Cheminformatics” to know more about drug-discovery using computers).
Similar screening procedures are also being performed in the petroleum industries. Here, they screen various surfaces (remember that the catalytic activity of (100) surface will be different from (111) surface of a metal) of several metal catalysts to find the best catalyst surface to perform a specific chemical reaction. Screenings are also being performed to understand the efficiency of a reaction at various adsorbate coverages and by considering interfaces of various catalysts. Similar applications also exist in the energy industries where several materials are being screened to find best conductors, insulators, photovoltaic materials, super-capacitors, electrolytes, battery materials, etc. The interested reader is advised to consult:
- The Materials Project
- The NOMAD Laboratory
- Automated Interactive Infrastructure and Database for Computational Science and a few other websites where thousands of compounds have already been screened for the above-mentioned applications.
Before ending this article, it is very important for us to understand the problems pertaining to CC. As computational chemistry solves the problem using a computer, the accuracy of the results will hugely depend on the method which we choose to solve the problem. For example, if you use classical mechanics to understand the electronic properties of a system, then obviously the results would be completely wrong as the electronic properties of a system are governed by the principles of quantum mechanics. Although this may look like an issue which is related to the knowledge of the computational chemist who is working on the problem, as explained below, it is not always the case.
For example, let us consider the case where one needs to solve the Schrödinger equation of a system consisting of a thousand carbon atoms to know its ground state energy. Here, one can solve the Schrödinger equation, using any of the quantum chemical methods, such as Hartree-Fock method (HF), Density Functional theory (DFT) and Coupled-cluster theory. The accuracy of the energy predicted by each of these methods is different with HF being the least accurate method and coupled cluster being the most accurate method. Despite having this particular knowledge about these methods, any reasonable computational chemist would not even think to solve the Schrödinger equation for the above 1000 atom system using a coupled cluster theory and instead, he/she would generally rely on other computational methods such as DFT. This is due to the humungous computational cost involved in using the coupled cluster theory. But as you might have already guessed, as one moves away from the coupled cluster theory, the accuracy of the predicted energy will be poor. Despite these problems with the accuracy, methods like DFT are regularly used to predict the trends as they are proven to be quite useful in many cases (and also because there is no other choice!).
Another important aspect which dictates the accuracy of the results is the numerical accuracy of the simulation. Although the method is accurate, if the numerical accuracy of our simulation is poor then we are bound to get wrong predictions. Importance of numerical accuracy can be understood by recalling our high school math on integrations. You might remember the fact that performing an integration of a curve is equivalent to predicting the area under the curve. For known curves like a sine curve, we can do the integration by applying a formula and by substituting the limits to predict the area under the curve. However, for complex curves, we don’t have any formula and we need to rely on numerical methods such as Simpson's or trapezoidal rules to perform an integration (obviously, they can be used to integrate any normal curve like a Sine curve).
When using these rules, we divide the whole curve into very small intervals and compute the area under each of such intervals (by approximating the area formed by the curve and x-axis as a rectangle, for example) and then finally sum all these small areas to get the area under the complex curve. Certainly, such an approximation is very good only when we divide the curve into many small pieces. If each such piece is very small, then the accuracy of the result is closer to what we expect (because the area under each such small piece can be actually approximated as the area of a rectangle for all practical purposes). On the other hand, if we consider a larger interval size, then obviously we would get poor results (because now the area under this large piece no more looks closer to a rectangle). While dealing with computers the interval size can be related to the usage of single precision (read as large interval size) vs double precision (small interval size) for representing the variables. Undoubtedly, there are few other problems such as scalability of a method (whether it can be used to simultaneously run on many CPUs), memory consumption, etc., but they are beyond the scope of this article. Thus, both the accuracy of the theory and the accuracy of the numerical simulation are very important to judge the accuracy of a result predicted using computational chemistry.
With this brief introduction to the field of computational chemistry, I believe you had an overview of how it works and how it is useful for various industries. I also assume that you got to know some of the pros and cons of applying computational chemistry to find solutions to various chemical problems. Finally, I hope that next time when you think about a chemist, you will also add a computer to your imagination!
Fun fact: Nobel prizes were awarded in 1998 and 2013 to the chemists working in the field of computational chemistry. Try to find out who got them and their contributions to the field!