First, let’s consider a parallelepided volume of free space representing a periodically repeating unit cell with a plane wave passing through it at an angle, as shown below:
The incident wavevector, \bf{k}, has component magnitudes: k_x = k_0 \sin(\alpha_1) \cos(\alpha_2), k_y = k_0 \sin(\alpha_1) \sin(\alpha_2), and k_z = k_0 \cos(\alpha_1) in the global coordinate system. This problem can be modeled by using Periodic boundary conditions on the sides of the domain and Port boundary conditions at the top and bottom. The most complex part of the problem set-up is defining the direction and polarization of the incoming and outgoing wave.
Although the COMSOL software is flexible enough to allow any definition of base coordinate system, in this posting, we will pick one and use it throughout. The direction of the incident light is defined by two angles, \alpha_1 and \alpha_2; and two vectors, \bf{n}, the outward pointing normal of the modeling space and \bf{a_1}, a vector in the plane of incidence. The convention we choose here is to align \bf{a_1} to the global x-axis and align \bf{n} with the global z-axis. Thus, the angle between the wavevector of the incoming wave and the global z-axis is \alpha_1, the elevation angle of incidence, where -\pi/2 > \alpha_1 > \pi/2 with \alpha_1 = 0, meaning normal incidence. The angle between the incident wavevector and the global x-axis is the azimuthal angle of incidence, \alpha_2, which lies in the range, -\pi/2 > \alpha_2 \geq \pi/2. As a consequence of this definition, positive values of both \alpha_1 and \alpha_2 imply that the wave is traveling in the positive x- and y-direction.
To use the above definition of direction of incidence, we need to specify the \bf{a_1} vector. This is done by picking a Periodic Port Reference Point, which must be one of the corner points of the incident port. The software uses the in-plane edges coming out of this point to define two vectors, \bf{a_1} and \bf{a_2}, such that \bf{a_1 \times a_2 = n}. In the figure below, we can see the four cases of \bf{a_1} and \bf{a_2} that satisfy this condition. Thus, the Periodic Port Reference Point on the incoming side port should be the point at the bottom left of the x-y plane, when looking down the z-axis and the surface. By choosing this point, the \bf{a_1} vector becomes aligned with the global x-axis.
Now that \bf{a_1} and \bf{a_2} have been defined on the incident side due to the choice of the Periodic Port Reference Point, the port on the outgoing side of the modeling domain must also be defined. The normal vector, \bf{n}, points in the opposite direction, hence the choice of the Periodic Port Reference Point must be adjusted. None of the four corner points will give a set of \bf{a_1} and \bf{a_2} that align with the vectors on the incident side, so we must choose one of the four points and adjust our definitions of \alpha_1 and \alpha_2. By choosing a periodic port reference point on the output side that is diametrically opposite the point chosen on the input side and applying a \pi/2 rotation to \alpha_2, the direction of \bf{a_1} is rotated to \bf{a_1'}, which points in the opposite direction of \bf{a_1} on the incident side. As a consequence of this rotation, \alpha_1 and \alpha_2 are switched in sign on the output side of the modeling domain.
Next, consider a modeling domain representing a dielectric half-space with a refractive index contrast between the input and output port sides that causes the wave to change direction, as shown below. From Snell’s law, we know that the angle of refraction is \beta=\arcsin \left( n_A\sin(\alpha_1)/n_B \right). This lets us compute the direction of the wavevector at the output port. Also, note that this relationship holds even if there are additional layers of dielectric sandwiched between the two half-spaces.
In summary, to define the direction of a plane wave traveling through a unit cell, we first need to choose two points, the Periodic Port Reference Points, which are diametrically opposite on the input and output sides. These points define the vectors \bf{a_1} and \bf{a_2}. As a consequence, \alpha_1 and \alpha_2 on the input side can be defined with respect to the global coordinate system. On the output side, the direction angles become: \alpha_{1,out} = -\arcsin \left( n_A\sin(\alpha_1)/n_B \right) and \alpha_{2,out}=-\alpha_2 + \pi/2.
The incoming plane wave can be in one of two polarizations, with either the electric or the magnetic field parallel to the x-y plane. All other polarizations, such as circular or elliptical, can be constructed from a linear combination of these two. The figure below shows the case of \alpha_2 = 0, with the magnetic field parallel to the x-y plane. For the case of \alpha_2 = 0, the magnetic field amplitude at the input and output ports is (0,1,0) in the global coordinate system. As the beam is rotated such that \alpha_2 \ne 0, the magnetic field amplitude becomes (\sin(\alpha_2), \cos(\alpha_2),0). For the orthogonal polarization, the electric field magnitude at the input can be defined similarly. At the output port, the field components in the x-y plane can be defined in the same way.
So far, we’ve seen how to define the direction and polarization of a plane wave that is propagating through a unit cell around a dielectric interface. You can see an example model of this in the Model Gallery that demonstrates an agreement with the analytically derived Fresnel Equations.
Next, let’s examine what happens when we introduce a structure with periodicity into the modeling domain. Consider a plane wave with \alpha_1, \alpha_2 \ne 0 incident upon a periodic structure as shown below. If the wavelength is sufficiently short compared to the grating spacing, one or several diffraction orders can be present. To understand these diffraction orders, we must look at the plane defined by the \bf{n} and \bf{k} vectors as well as in the plane defined by the \bf{n} and \bf{k \times n} vectors.
First, looking normal to the plane defined by \bf{n} and \bf{k}, we see that there can be a transmitted 0^{th} order mode with direction defined by Snell’s law as described above. There is also a 0^{th} order reflected component. There also may be some absorption in the structure, but that is not pictured here. The figure below shows only the 0^{th} order transmitted mode. The spacing, d, is the periodicity in the plane defined by the \bf{n} and \bf{k} vectors.
For short enough wavelengths, there can also be higher-order diffracted modes. These are shown in the figure below, for the m=\pm1 cases.
The condition for the existence of these modes is that:
for: m=0,\pm 1, \pm 2,…
For m=0 , this reduces to Snell’s law, as above. For \beta_{m\ne0}, if the difference in path lengths equals an integer number of wavelengths in vacuum, then there is constructive interference and a beam of order m is diffracted by angle \beta_{m}. Note that there need not be equal numbers of positive and negative m-orders.
Next, we look along the plane defined by the \bf{n} and \bf{k} vectors. That is, we rotate our viewpoint around the z-axis such that the incident wavevector appears to be coming in normally to the surface. The diffraction into this plane are indexed as the n-order beams. Note that the periodic spacing, w, will be different in this plane and that there will always be equal numbers of positive and negative n-orders.
COMSOL will automatically compute these m,n \ne 0 order modes during the set-up of a Periodic Port and define listener ports so that it is possible to evaluate how much energy gets diffracted into each mode.
Last, we must consider that the wave may experience a rotation of its polarization as it gets diffracted. Thus, each diffracted order consists of two orthogonal polarizations, the In-plane vector and Out-of-plane vector components. Looking at the plane defined by \bf{n} and the diffracted wavevector \bf{k_D}, the diffracted field can have two components. The Out-of-plane vector component is the diffracted beam that is polarized out of the plane of diffraction (the plane defined by \bf{n} and \bf{k}), while the In-plane vector component has the orthogonal polarization. Thus, if the In-plane vector component is non-zero for a particular diffraction order, this means that the incoming wave experiences a rotation of polarization as it is diffracted. Similar definitions hold for the n \ne 0 order modes.
Consider a periodic structure on a dielectric substrate. As the incident beam comes in at \alpha_1, \alpha_2 \ne 0 and there are higher diffracted orders, the visualization of all of the diffracted orders can become quite involved. In the figure below, the incoming plane wave direction is shown as a yellow vector. The n=0 diffracted orders are shown as blue arrows for diffraction in the positive z-direction and cyan arrows for diffraction into the negative z-direction. Diffraction into the n \ne 0 order modes are shown as red and magenta for the positive and negative directions. There can be diffraction into each of these directions and the diffracted wave can be polarized either in or out of the plane of diffraction. The plane of diffraction itself is visualized as a circular arc. Note that the plane of diffraction for the n \ne 0 modes is different in the positive and negative z-direction.
All of the ports are automatically set up when defining a periodic structure in 3D. They capture these various diffracted orders and can compute the fields and relative phase in each order. Understanding the meaning and interpretation of these ports is helpful when modeling periodic structures.
]]>
As we just learned, the fully coupled approach to solving a steady-state nonlinear problem actually uses the exact same damped Newton-Raphson algorithm used to solve a single physics nonlinear problem. Although this algorithm does converge well for many cases, it can fail or converge very slowly if the choice of initial conditions are poor. It should come as no surprise then that the techniques we have already looked at, such as Load Ramping and Nonlinearity Ramping, are just as valid when applied to a multiphysics problem. In fact, there is really nothing to add to these techniques — they can be used equivalently.
There is one new variation of the nonlinearity ramping technique, and that is to ramp the coupling between the physics. Numerically, it is in fact identical to the nonlinearity ramping technique already discussed, but conceptually it is the magnitude of the couplings between the physics that is ramped up, rather than the magnitude of the nonlinearity in a single physics. The only difficulty is choosing, and implementing, the term that should be ramped. Luckily, most multiphysics problems have quite obvious couplings between the physics, which can be found simply by writing out the governing equations and boundary conditions and examining how the material properties and loads are dependent upon the variables being solved for.
The most important thing to remember is that the underlying algorithm used to solve a fully coupled multiphysics problem is exactly the same as the algorithm used to solve a nonlinear single physics problem. Keeping this in mind, you will find that fully coupled mutliphysics problems really do not pose any additional conceptual hurdles beyond understanding how the physics in the model interact with each other.
On the other hand, the segregated approach can lead to a variety of different solution strategies that can greatly accelerate solution convergence, and significantly affect the amount of memory needed to solve the problem. To understand this, let’s make a flowchart of the different multiphysics solution techniques. Consider the same problem from our previous blog post about a busbar that heats up due to current flow and experiences thermal stresses.
First, the fully coupled solver starts from an initial guess and applies Newton-Raphson iterations until the solution has converged:
When solving such a problem, you will get a Convergence Plot, which shows the error estimate decreasing between Newton-Raphson iterations. Ideally, the error should go down monotonically if it does converge, then start investigating ramping the loads, the nonlinearities, or the multiphysics couplings. This approach will almost always require a more memory-intensive direct solver to solve the linear system of equations in each Newton-Raphson step.
Now, compare the fully coupled approach to the segregated approach, which solves each physics sequentially until convergence:
You will get a different kind of convergence plot for such a problem, one that shows the error associated with each physics you are solving. Each of the physics can use the optimal solver, either the direct or the less memory-intensive iterative, to solve the linear system of equations. Each segregated step can be a nonlinear problem on its own, and can be solved to a desired tolerance, and with custom damping, as appropriate for the particular combination of physics problem being solved.
With this solution method, you will get at least two convergence plots, one for the iterative solver(s) possibly used within a segregated step, and a second for the overall convergence of the segregated approach:
The above plot shows the decrease in error for each physics. Although more iterations may be required for the same problem, each loop through the segregated solution approach can be much faster than the Newton-Raphson step required for the fully coupled approach. You can also get a little bit more information out of this, if only one or two physics are not converging, then you will want to check the set-up of these first.
One thing you may recall about this problem is that the temperature change is driven by the resistive heating from the current, and the current distribution depends upon the electrical conductivity, which is temperature-dependent. That is, the voltage and temperature solutions are bi-directionally coupled. On the other hand, although the thermal strain and the Young’s Modulus are dependent upon temperature, the voltage and temperature solutions do not depend upon the displacements or stresses. That is, there is a uni-directional coupling from the thermal problem to the structural problem. We can immediately see that there is an even more efficient way to solve this problem. We can solve the voltage and temperature problem first and subsequently solve for the displacements:
So, we can see that there are (at least) three different ways of solving this problem: fully coupled, segregated and assuming couplings between all of the physics are segregated, or with a sequential solution step to take advantage of the uni-directional coupling between temperature and displacements. When solving a multiphysics problem, COMSOL will assume coupling between all physics, and try to choose the optimal fully coupled or segregated approach, based on the physics and the problem size. Of course, it is always instructive to go into the solver settings to see what settings the software has chosen.
This series of postings has been designed to give you an understanding of the algorithms used in COMSOL to solve single physics and multiphysics linear and nonlinear steady-state problems. Issues such as meshing, accuracy, and convergence have been covered. With this information, you should be able to more confidently address the solutions to your models of this type.
]]>
Let’s start by considering a very simple steady-state multiphysics problem: A coupling of steady-state electric current flow through a metal busbar, heat transfer in the bar, and structural deformations. Resistive heating arises due to the current flow, which raises the temperature of the bar and causes it to expand. In addition, the temperature rise will be significant enough that the electrical, thermal, and structural material property variations with temperature must be considered. We want to find the current flow, temperature fields, and deformations and stresses under steady-state conditions. The figure below shows a schematic of the problem being solved.
The multiphysics problem at hand.
There are here three governing partial differential equations being solved. First off, the equation that describes the voltage distribution within the domain is:
After discretizing via the finite element method, we can write a system of equations as:
where the subscript, _{V}, denotes the voltage unknowns, and the system matrix, \mathbf{K}_V, is dependent upon the temperature unknowns, \mathbf{u}_T. Assuming that the voltage distribution is known, then the volumetric resistive heating can be computed from:
where \bf{E}, the electric field, is: -\nabla V. This heat source shows up in the governing equation for temperature:
And this equation gives us the system of equations:
Once we have the temperature distribution within the domain, we can solve for the structural displacements:
where the elasticity matrix, \bf{C}, is computed based on the temperature-dependent Young’s Modulus, E(T). The imposed thermal strain is \epsilon_{\Delta T} = \alpha(T-T_0) and the strain is \epsilon = 1/2 [{\nabla \mathbf{u}^\mathbf{T}_D + \nabla \mathbf{u}_D}]. The system of equations that solve for the displacements is written as:
where the subscript, {_D}, indicates the displacement unknowns.
We can combine these systems of equations together:
We can see by examination that this is a nonlinear problem, and as we learned earlier, this requires that we find the solution by taking Newton-Raphson iterations until we get convergence:
That is really are there is to it! There is no conceptual difference at all between solving a single physics nonlinear problem and solving a coupled physics problem. Everything that we have already learned about solving nonlinear single physics problems, including all of the discussions about damping, load and nonlinearity ramping, as well as meshing, is just as valid for solving a multiphysics problem.
But it is also important to understand a (sometimes very serious) drawback to the above approach. During the Newton-Raphson iteration, we need to evaluate the derivative, \mathbf{f’(u}^{i}), so let’s write that out:
where the comma derivative notation is used, e.g.: \mathbf{K}_{V,T}=\partial \mathbf{K} _{V}(\mathbf{u}_T)/\partial \mathbf{u}_T.
Clearly, the above matrix is non-symmetric, and this can lead to a problem: If the system matrix is not definite, then we may need to use the more memory-intensive direct solvers. (Although iterative solvers, with the right choice of preconditioner, can solve a wider class of problems they cannot be guaranteed to handle all cases.) Solving such a multiphysics problem with a direct solver will be both memory- and time-intensive.
However, there is an alternative. The above method, called a Fully Coupled approach, assumes that all of the couplings between the physics must be considered at the same time. In fact, for the purposes of solving many types of multiphysics problems, we can neglect these off-diagonal terms during the solution, and solve using a more memory- and time-efficient Segregated approach.
The Segregated approach treats each physics sequentially, using the results of the previously solved physics to evaluate the loads and material properties for the next physics being solved. So, using the above example, we first take a Newton-Raphson iteration for the voltage solution:
where, for the first iteration, we must have a starting guess for voltage and temperature ( \mathbf{u}_V^{i=0} , \mathbf{u}_T^{i=0} ). The material properties are evaluated using the initial conditions given for the temperature field. Next, the temperature solution is evaluated:
where, for the first iteration, i=0, the initial conditions given for the temperature field are used to evaluate the materials properties, \mathbf{K}_T(\mathbf{u}_T^{i=0}) , but the loads are evaluated based upon the voltage solution that was just computed: \mathbf{b}_T(\mathbf{u}_V^{i=1}) . Similarly, the displacement field is solved:
where the material properties and loads for the structural problem are computed using the temperature field computed above.
These iterations are then continued: voltage, temperature, and displacement are repeatedly computed in sequence. The algorithm is continued until convergence is achieved, as defined earlier.
The great advantage to the above approach is that the optimal iterative solver can be used in each linear substep. Not only are you now solving a smaller problem in each substep, but you can also use a solver that is more memory-efficient and generally solves faster. Although the segregated approach generally does require more iterations until convergence, each iteration takes significantly less time than one iteration of the fully coupled approach.
The algorithm used by the segregated solver for a model composed of n number of different physics is:
For general multiphysics problems, you will still have to choose the order in which the physics are solved, but the software has default suggestions as to an appropriate sequence for all built-in multiphysics interfaces. COMSOL Multiphysics will provide default linear solver settings for each physics in the segregated sequence.
When the segregated approach is applicable, it will converge to the same answer as the fully coupled approach. The segregated approach will usually take more iterations to converge; however, the memory and time requirements for each sub-step will be lower, so the total solution time and memory usage can be lower with the segregated approach.
In this blog post, we have outlined the two classes of algorithms used to solve multiphysics problems — the Fully Coupled and the Segregated approach. The Fully Coupled approach is essentially identical to the Newton-Raphson method already developed for solving single physics nonlinear problems. It was shown to be very memory-intensive, but is useful, and generally needed, for multiphysics problems that have very strong interactions between the various physics being solved. On the other hand, the Segregated approach assumes that each physics can be solved independently, and will iterate through the various physics in the model until convergence.
]]>
Microwave plasmas are sustained when electrons can gain enough energy from an electromagnetic wave as it penetrates into the plasma. The physics of a microwave plasma is quite different depending on whether the TE mode (out-of-plane electric field) or the TM mode (in-plane electric field) is propagating. In both cases, it is not possible for the electromagnetic wave to penetrate into regions of the plasma where the electron density exceeds the critical electron density (around 7.6×10^{16} 1/m^{3} for 2.45 GHz). The critical electron density is given by the formula:
where \epsilon_0 is the permittivity of free space, m_e is the electron mass, \omega is the angular frequency, and e is the electron charge. This corresponds to the point at which the angular frequency of the electromagnetic wave is equal to the plasma frequency. The pressure range for microwave plasmas is very broad. For electron cyclotron resonance (ECR) plasmas, the pressure can be on the order of around 1 Pa, while for non-ECR plasmas, the pressure typically ranges from 100 Pa up to atmospheric pressure. The power can range from a few watts to several kilowatts. Microwave plasmas are popular due to the cheap availability of microwave power.
In order to understand the nuances associated with modeling microwave plasmas, it is necessary to go over some of the theory as to how the discharge is sustained. The plasma characteristics on the microwave time scale are separated from the longer term plasma behavior, which is governed by the ambipolar fields.
In the Plasma Module, the electromagnetic waves are computed in the frequency domain and all other variables in the time domain. In order to justify this approach, we start from Maxwell’s equations, which state that:
(2)
(3)
where \mathbf{\tilde{E}} is the electric field (V/m), \mathbf{\tilde{B}} is the magnetic flux density (T), \mathbf{\tilde{H}} is the magnetic field (A/m), \mathbf{\tilde{J}_p} is the plasma current density (A/m^{2}), and \mathbf{\tilde{D}} is the electrical displacement (C/m^{2}). The tilde is used to denote that the field is varying in time with frequency \omega/2 \pi. The plasma current density can be approximated by this expression:
(4)
where e is the unit charge (C), n_e is the electron density (1/m^{3}), and \mathbf{\tilde{v}_e} is the mean electron velocity under the following two assumptions (Ref 1):
The mean electron velocity on the microwave time scale, \tilde{\mathbf{v}}_e, is obtained by assuming a Maxwellian distribution function and taking a first moment of the Boltzmann equation (Ref 2):
(5)
where m_e is the electron mass (kg) and \nu_m is the momentum transfer frequency between the electrons and background gas (1/s). As pointed out in Ref 1, the equations are linear, so we can take a Fourier transform of the equation system. Taking a Fourier transform of equation (5) gives:
(6)
where the tilde has been replaced by a bar to reflect the fact that we are now referring to the amplitude of the fields. Multiplying both sides by -e n_e and re-arranging gives:
(7)
or, in a simpler form:
(8)
where
(9)
Equations (1) and (2) can be re-arranged by taking the time derivative of (2) and substituting in (1)
(10)
where \mu is the permeability, \sigma is given in equation (8) above, and the plasma relative permittivity set to one. The equation could also be recast where the relative permittivity is complex-valued and the plasma conductivity is zero (Ref 3). The convention employed throughout the Plasma Module is that the plasma conductivity is given by equation (8) and the plasma relative permittivity is set to 1.
Solving the above equation with appropriate boundary conditions allows for the power transferred from the electromagnetic fields to the electrons to be calculated:
(11)
where \mathbf{\bar{J}} is the total current density (the plasma current plus the displacement current density) and * denotes the complex conjugate.
In addition to the equation above, a set of equations are solved in the time domain for the electron density n_e, electron energy density n_{\epsilon}, plasma potential V, and all ionic and neutral species. For the electron density:
(12)
where the electron flux, \mathbf{\Gamma}_e (1/(m^{2}s)), is given by:
(13)
where \mu_e is the electron mobility (m^{2}/(V-s)) and D_e is the electron diffusivity (m^{2}/s). Notice that \mathbf{E} given above has no tilde associated with it. The electric field in this case is a static electric field that arises due to the separation of ions and electrons in the plasma. This is often known as the ambipolar field, and causes loss of electrons and ions to the reactor walls on time scales much longer than the microwave time scale (microseconds rather than sub-nanoseconds). The electron energy density, n_{\epsilon}, is computed using a similar equation:
(14)
where the third term on the left-hand side represents heating or cooling of electrons depending on whether their drift velocity is aligned with the ambipolar electric field. The heating of electrons due to the microwaves is given by the last term on the right-hand side and is defined in equation (10). The electron energy flux is given by:
(15)
The mean electron energy is computed using \bar{\epsilon} = n_{\epsilon}/n_e, \mu_{\epsilon} is the electron energy mobility (m^{2}/(V-s)), D_{\epsilon} is the electron energy diffusivity (m^{2}/s), and the term S_{\epsilon} represents energy loss due to elastic and inelastic collisions. This term is a highly nonlinear function of the mean electron energy and also a function of the electron density, background number density, and plasma chemistry. The complexities of this source term are not relevant to this discussion, but more details are given in the Plasma Module User’s Guide. For each ion and neutral species, a similar drift-diffusion equation is solved for the mass fraction of each species, w_k:
(16)
where the subscript k indicates the k^{th} species. The mass flux vector, \mathbf{j}_k, represents mass transport due to migration from the ambipolar field and diffusion from concentration gradients (kg/m^{2}s) and R_k is the reaction source or sink (kg/m^{3}s). Again, further details are available in the Plasma Module User’s Guide and are not relevant to this discussion.
Finally, Poisson’s equation is solved in order to compute the ambipolar electric field generated by the separation of charges:
(17)
where \rho_v is the space charge density (C/m^{3}) and \rho_v = e(n_i^+-n_e-n_i^-) where n_i^+ is the total number of positive ions and n_i^- is the total number of negative ions.
To summarize, the Microwave Plasma interface solves equations (9), (11), (13), (15), and (16) along with a suitable set of boundary conditions.
In 2D or 2D axisymmetric models, the electromagnetic waves propagate in either the transverse electric (TE) mode or the transverse magnetic (TM) mode. In the TE mode, the electric field is only in the transverse direction and the magnetic field in the direction of propagation. Therefore, COMSOL solves only for the out-of-plane component of the high-frequency electric field. In the TM mode, the magnetic field is in the transverse direction and the electric field only in the direction of propagation, so COMSOL solves only for the in-plane components of the high-frequency electric field.
In the TE mode, electrons do not experience any change in the high-frequency electric field during the microwave time scale. This means that the phase coherence between the electrons and electromagnetic waves is only destroyed through collisions with the background gas. The loss of phase coherence between the electrons and high-frequency fields is what results in energy gain for the electrons. Therefore, the momentum collision frequency is simply given by:
where \nu_e is the collision frequency between the electrons and neutrals.
The TM mode causes in-plane motion of the electrons on the microwave time scale, so in regions where the high-frequency electric field is significant (the contour where the electron density is equal to the critical density), the time-averaged electric field experienced by the electrons may be non-zero. This destroys the phase coherence between the electrons and the fields, causing the electrons to gain energy. This is an example of a non-local kinetic effect, which is difficult to approximate with a fluid model. However, since this effect is similar to collisions with a background gas, the non-local effects can be approximated by adding an effective collision frequency to the momentum collision frequency:
where \nu_{\textrm{eff}} is the effective collision frequency to account for non-local effects. This is discussed in more detail in Ref 1, where an effective collision frequency of no more than \omega/20 is suggested.
When modeling ECR (electron cyclotron resonance) reactors, another layer of complication is added to the problem. The electron transport properties become tensors and functions of a static magnetic flux density, which can be created using permanent magnets. The plasma conductivity also becomes a full tensor, and a highly nonlinear function of the static magnetic flux density. In addition, it is necessary to consider all three components of the electromagnetic field. Comprehensive details on how to set up and solve a model of an ECR reactor can be found in the Dipolar Microwave Plasma Source model documentation.
The Microwave Plasma interface can be used to model the three types of wave heated discharges listed above, but some care is required when setting up such models. In the Microwave Plasma settings window, there are three options under “Electric field components solved for”:
The options are as follows:
The above equations are quite straightforward to solve, provided that the plasma frequency is below the angular frequency everywhere in the modeling domain. At a frequency of 2.45 GHz, this corresponds to an electron density of 7.6×10^{16} 1/m^{3}, which is lower than most industrial applications. When the plasma density is equal to this value, the electromagnetic wave transitions from propagating waves to evanescent waves. Applications of microwave plasmas where the electron density is greater than the critical density include:
The resonance zone can be smoothed by activating the “Compute tensor plasma conductivity” checkbox in the Plasma Properties section:
The Doppler broadening parameter, \delta, corresponds to the value used for the effective collision frequency via the formula:
Therefore, a value of 20 is a compromise between accuracy and numerical stability as detailed above.
When using the Port boundary condition, the sum of the deposited and reflected power is supplied by default. In COMSOL Multiphysics version 4.4 it is also possible to specify only the deposited power, as shown in the settings window below:
Using this option results in a more stable equation system because the total power transferred to the electrons remains constant. When the “Port input power” option is used, some of the power is deposited and some is reflected back out of the port, depending on the plasma’s current state. The plasma can go from absorbing a very small amount to a very large amount of power in a very short time period, which can make the problem numerically unstable or lead to the solver taking extremely small time steps.
The following is a collection of tips and tricks to try to help with convergence and decrease computation time:
The following suggestions apply to all types of plasmas, but are worth mentioning again:
Solver settings play an important role and COMSOL will automatically generate the best solver settings depending on how the model is set up. By default, when the “Port input power” option is used, the solver settings mentioned below are implemented. The segregated solver is used with two groups:
When the “Specify deposited power” option in the Port boundary condition is used, the solver suggestion is modified so that there are three groups:
P
deposited, which is a differential algebraic equation used to fix the deposited, rather than total powerAn example of a TM mode microwave plasma can be found in the Model Gallery. The model uses an effective collision frequency of \omega/20, which smooths the region over which power is deposited to the electrons. As can be seen from the figure below, nearly all power deposition is still highly localized to the contour of critical electron density.
Plot of the power deposition into the plasma due to the high frequency fields. The white contour
is the contour of critical electron density.
Collisionless heating of electrons occurring in the TM mode can be demonstrated using the Particle Tracing Module. By starting an ensemble of particles with an initial mean energy of 0.5 eV on the contour of critical plasma density, the time evolution of the mean energy can be computed. The two plots below show how collisionless heating occurs in the TM mode, but not the TE mode.
Plot of the mean electron energy for electrons in a TM mode collisionless plasma released on the contour
of critical plasma density. There is a net energy gain even though there are no collisions.
Plot of the mean electron energy for electrons in a TE mode collisionless plasma released on the contour
of critical plasma density. There is no net energy gain over a number of RF cycles.
There are three key points that you should recall from the blog post on meshing considerations for linear static problems. These are:
When addressing nonlinear problems, we have already learned that even a finite element problem with a single degree of freedom may not converge, even for a problem that has a solution. We have learned several techniques that can address this issue, but have not yet introduced the interplay of the mesh and the nonlinear solver.
The single most important thing to keep in mind when meshing nonlinear problems is this:
Even if the problem is well-posed, and even if we have chosen a good solution method, the problem may still fail to converge if the problem is not meshed finely enough in the regions of strong nonlinearities.
To understand this, let’s take a look at a one-dimensional thermal finite element problem. We will consider a 1 m thick wall with a fixed temperature of T=0 at one end at T=100 at the other, as shown below:
We will examine the solutions to this problem for the different thermal conductivities plotted below:
If we plot out the solution for the linear case, k=25, we get:
By examination, we see that the solution is a straight line. For this case, the solution can be found by using a single linear element across the entire domain.
Now, if we plot out the case k=\exp(T/25), with elements delineated by dashed lines, we get:
We can see that the solution to this nonlinear problem will require more than a single element across the domain. In fact, regardless of how many elements we use, the polynomial basis function will never perfectly match the true solution. We can successively refine the mesh everywhere in the domain and get closer and closer to the true solution, just as we did for a linear problem.
Finally, if we plot out the case k=1+50\exp\left[-(T-50)^2\right], we get:
This solution is more complicated. There are clearly regions of the solution where a single element would be almost sufficient to completely describe the solution. Yet, there are regions where the solution varies quite rapidly as a function of position. These regions are around T=50, where there are strong nonlinearities in the material property function. Although the material property function has only one region of strong nonlinearity with respect to temperature, the solution exhibits two regions over the domain where the solution varies rapidly. Only these regions in space require a finer mesh. In fact, the solver might not converge at all if the mesh in these regions is too coarse.
For these types of problems, adaptive mesh refinement becomes highly motivated, since the locations of the gradients in the modeling domain are generally not known ahead of time. Ramping of the nonlinearities in the model is also helpful, since starting with a linear problem will result in a problem that can always be solved, regardless of the mesh. By gradually ramping up the nonlinearity, and performing adaptive mesh refinement iteratively, it is possible to improve model convergence for nonlinear problems.
Meshing of nonlinear stationary finite element problems is inherently linked with the question of getting a nonlinear model to converge. Convergence rates, and even the possibility of convergence, are dependent on both the solver algorithm used and the mesh. All of the techniques mentioned up to this point: manual and adaptive mesh refinement, choosing of initial conditions, load ramping, nonlinearity ramping, and any combination of these techniques may be needed as you develop more and more sophisticated models. Finally, always keep in mind that a mesh refinement study is needed to assess solution accuracy.
For an example model that incorporates all of the techniques that we have learned about thus far, please see the Cooling and Solidification of Metal model. Mastering these techniques will allow you to quickly and efficiently model nonlinear problems.
]]>
Consider again the system shown below, of a force applied to a spring with nonlinear stiffness.
We’ve seen that we can solve this problem using the Newton method with damping or by using the continuation method and ramping the load to give the Newton method good starting points. Now we’ll examine how to ramp the nonlinearity. First let’s take another look at the function describing the force balance on our single node:
We can re-write this more generally as: f(u)=p-k(u)u where k(u) is the nonlinear spring stiffness. Now we can solve a different problem that uses a stiffness defined as:
In other words, we divide our spring stiffness function into two parts — a linear term, k(u_0), and a nonlinear term, \left[ k(u) - k(u_0) \right] — and then introduce an additional parameter, \beta, that interpolates between the linear and nonlinear case. We then use the same Newton method as before on a series of problems with the parameter \beta, getting ramped from zero to one. That is, we use the continuation method to ramp from a (simple to solve) linear problem to a (more difficult) nonlinear problem.
Next we will look at solving the above example by using this technique. Our original spring stiffness, k(u)=\exp(u), gets re-written as:
We start by solving for \beta=0 and get a linear spring stiffness of k(u)=\exp(u_0), so now all we need to do is choose a linearization point, u_0. For this example, if we choose u_0=0, we see that f(u,\beta=0)=2-\exp(0)u = 2-u. Recall our discussion about solving linear static finite element problems, where we learned that you will always find the solution to a linear problem in a single Newton iteration. Now, ramp up the parameter \beta, as shown:
Clearly, only a few Newton iterations, starting from the solution to the \beta=0 case, are needed to solve \beta=0.25. So we can repeat with \beta=1 and thereby ramp from the fully linear to the fully nonlinear case.
This method is attractive because you can always find a solution to a linear problem, so you can always solve for \beta=0. You only need to consider which point u_0 to linearize about initially and what kind of nonlinearity ramping to use.
We can also use the concept of nonlinearity ramping to address the case where the nonlinear terms are not continuously differentiable. Recall the case from the blog post on Solving Nonlinear Static Finite Element Problems, the system with the piecewise constant spring stiffness, k=0.5 for u\le1.8, k=1 for 1.8<u<2.2, and k=1.5 for u\ge2.2, would result in a force balance function:
As we saw earlier, this problem cannot be solved by the Newton method, unless you happen to start within the (very small) radius of convergence of the solution. But now consider replacing the original spring stiffness with a smoothed stiffness that can be ramped up as shown in the figure below:
Clearly, this problem is solvable, and we can use this technique to get an approximate solution to the original problem. Using this method just requires that we find an appropriate smoothing function and nonlinearity ramping path.
Whenever you have a problem which has the kind of stepped behavior shown above, it is also worth trying out the Double Dogleg nonlinear solver instead of the Newton method. The Double-Dogleg is a Trust Region solver that works well when solving problems where the Newton method may oscillate between two different regions. A good physical example of this is a structural contact problem, where there is a sudden transfer of load as two objects come into physical contact.
We have now seen two methods for improving the convergence of nonlinear problems: load ramping and nonlinearity ramping. In practice, either or both methods can be used, and, through the careful design of your material properties and loads, it is possible to blend the two approaches. It can be difficult to say ahead of time which method will perform better, and each model you work on will require some experimentation in terms of the load ramping path, the nonlinearity ramping, and the choice of initial condition for linearization. Also, if you expect that the solution may oscillate between different cases, the Double-Dogleg solver can perform better than the Newton method. With experience, you will build up your engineering intuition about how best to solve the classes of problems that you are working on.
The techniques introduced here often work well for nonlinear static finite element problems where it may be difficult to find good initial conditions, or problems which have strong nonlinearities and discontinuities in the material properties. In practice, a very wide class of problems can be addressed using these approaches. However, you must also be aware that there are different meshing requirements when solving nonlinear problems. That is the next topic we will address, so stay tuned.
]]>
Consider again the system of a force applied to a spring with nonlinear stiffness:
We can solve this problem with the damped Newton-Raphson method as long as we choose an appropriate initial condition (earlier we chose u_0=0). In the other blog entry, we noticed that choosing an initial condition outside of the radius on convergence, any point u_0\le-1 for example, will cause the solver to fail. Now, for this single degree of freedom problem we can easily determine the radius of convergence, but for typical finite element problems it would be much harder. So instead of trying to find the radius of convergence, let’s instead apply a little bit of physical intuition to this problem.
Here we are applying a load, p_f, to a system and we are trying to find a solution by starting from an initial condition, u_0. But what happens if we apply a load p=0? Newton’s First Law tells us that a system under no load will have no deformations. So what happens if we apply a load, p_1, of magnitude infinitesimally larger than zero? It would be reasonable to assume that the Newton-Raphson method, starting from u_0=0, will be able to find a solution, u_1. It is also reasonable to say that we can then increment the load to p_2 such that p_1<p_2<p_f and again find a solution u_2, as long as the load increment is small enough. Repeating this algorithm, we will eventually get to the final load p_f, and our desired solution. That is, starting from a zero load, and zero solution, we gradually ramp up the load until we achieve the desired total load. This procedure is plotted in the figure below. The dark arrows indicate where the Newton-Raphson iterations start for a particular load value.
This algorithm is also referred to as a continuation method on the load. This gradual ramping up the load from a value close to zero is often a more robust approach to solving nonlinear problems via the damped Newton method, since the previous solutions are good initial guesses for the next step.
With this algorithm, we not only have a good way of addressing the issue of finding a good starting point for the Newton-Raphson iterations, we also have an algorithm that is useful for the case of a problem that does not have a solution. Consider again the problem where the spring gets weaker as it is pulled, where f(u)=2-\exp(-u)u as discussed previously. This problem does not have a solution. In this case, we can analytically determine that for any load p>\exp(-1) there is no solution. But if we use a smaller load, then the system is stable. In fact, in our scenario the system is bi-stable; there are two solutions for every load p \le \exp(-1). Although, we are probably only interested in the branch we get to starting from p=0 and u_0=0. Let us plot out f(u):
Now let’s also assume that we do not know that the peak possible load is at p = \exp(-1), and examine what happens when COMSOL tries to solve this problem for p = 0.2, 0.3, 0.4. If we plot out f(u) for p = 0.2, 0.3 we see that for p = 0.4 there is no solution to be found. The continuation solver in COMSOL will then automatically perform a search over the interval between the last successful load value and the next desired load step. That is, the solver tries to backtrack to find an intermediate solution that can then be used as a starting value for the next step. This algorithm is always used whenever the Continuation Method feature (or the Parametric Sweep feature) is used on a single parameter when solving a stationary problem. In that case, the solver will be able to find the approximate failure load of the system, which is also very useful information.
We have now introduced the concept of load ramping and using the continuation method to improve the robustness of the Newton method. Since a system with no load has a known solution, we have seen that this technique can eliminate the question of what value to choose for the initial condition. We also learned that it is possible to approximately find the failure load. For these reasons, load ramping is one important technique that you should understand when setting up and solving nonlinear static finite element problems.
Let’s take a look at a log file from a nonlinear finite element problem. We’ll set up and solve the problem described above, of a nonlinear spring that gets weaker as we pull on it. We know that this problem does not have a solution, so lets see what happens:
Stationary Solver 1 in Solver 1 started at 15-Jul-2013 11:26:46. Parametric solver Nonlinear solver Number of degrees of freedom solved for: 1. Parameter P = 0.2. Symmetric matrices found. Scales for dependent variables: State variable u (mod1.ODE1): 1 Iter ErrEst Damping Stepsize #Res #Jac #Sol 1 0.18 1.0000000 1 2 1 2 2 0.013 1.0000000 0.22 3 2 4 3 6.5e-005 1.0000000 0.015 4 3 6 Parameter P = 0.3. Iter ErrEst Damping Stepsize #Res #Jac #Sol 1 0.025 1.0000000 0.21 7 4 9 2 0.00069 1.0000000 0.031 8 5 11 Parameter P = 0.4. Iter ErrEst Damping Stepsize #Res #Jac #Sol 1 0.89 1.0000000 2.7 11 6 14 2 0.3 0.8614583 0.76 12 7 16 3 0.2 0.8154018 0.43 13 8 18 4 0.31 0.4194888 0.42 14 9 20 5 0.86 0.0836516 0.9 15 10 22 Parameter P = 0.325. Iter ErrEst Damping Stepsize #Res #Jac #Sol 1 0.089 1.0000000 0.4 18 12 26 2 0.014 1.0000000 0.13 19 13 28 3 0.0003 1.0000000 0.018 20 14 30 Parameter P = 0.375. Iter ErrEst Damping Stepsize #Res #Jac #Sol 1 0.099 1.0000000 0.32 23 15 33 2 0.079 0.9390806 0.19 24 16 35 3 0.2 0.3028345 0.24 25 17 37 4 0.94 0.0302834 0.95 26 18 39 ... SOME PARTS OF THIS LOG FILE OMITTED ... Parameter P = 0.368359. Iter ErrEst Damping Stepsize #Res #Jac #Sol 1 0.046 1.0000000 0.057 80 49 112 2 0.061 0.3013806 0.072 81 50 114 Stationary Solver 1 in Solver 1: Solution time: 0 s Physical memory: 471 MB Virtual memory: 569 MB
The solver also reports an error:
Failed to find a solution for all parameters, even when using the minimum parameter step. No convergence, even when using the minimum damping factor. Returned solution is not converged.
The beginning of the log file is as before, except that the solver now reports that the Parametric Solver is being called. We see that, for P = 0.2 and P = 0.3, the solver completes. For P = 0.4, the solver fails and then automatically backtracks to try to find intermediate points that solve. Some of the intermediate steps are omitted for brevity, but we see that the parametric solver ends up very close to the analytic solution for the peak load. From this information, we could re-solve the problem with a different set of parameters and get a better idea of how the system behaves as we approach the failure load, which is often useful information.
]]>
Consider the system shown below, of a spring that is attached to a rigid wall at one end, and with an applied force at the other end. The stiffness of the spring is a function of the distance it is stretched, k(u)=exp(u). That is, the spring stiffness increases exponentially as it is stretched.
We are interested in finding the displacement of the end of the spring, where the force is applied. Just as we did earlier for the linear problem, we can now write the following function describing the balance of forces on the node for the nonlinear finite element problem:
In this case, only the spring stiffness is dependent on the solution, but more generally, both the load and the properties of the elements can be arbitrarily dependent upon the solution in a nonlinear problem.
Let us plot out this function, and keep in mind that we are trying to find u such that f(u)=0.
Finding the solution to the problem is, in fact, only marginally different from the linear case. Recall that to solve the linear problem we took a single Newton-Raphson iteration — and we do the exact same thing here:
As you can see, we again start at an initial guess to the solution, u_0=0, and evaluate the function, f(u_0), as well as its derivative, f'(u_0). This gets us to the the point u_1. By examination, we see that this is not the solution, since f(u_1) \ne 0. But if we continue to take Newton-Raphson iterations, as shown below, it becomes clear that we are approaching the solution to the problem. (For more details about this algorithm, you can use this resource on Newton’s method.)
So finding the solution to a nonlinear problem is essentially identical to solving a linear problem, except that we take multiple Newton-Raphson steps to get to the solution. In fact, we could continue to take iterations and get arbitrarily close to the solution, but this is not needed. As discussed earlier, we always run into issues of numerical precision on computers, so there is a practical limit to how close we can get. Let’s have a look at the results after several iterations:
i | u_i | |f(u_i)| | |u_{i-1}-u_i| | |f(u_{i-1})-f(u_i)| |
---|---|---|---|---|
0 | 0.000 | 2.000 | ||
1 | 2.000 | 12.77 | 2.000 | 10.77 |
2 | 1.424 | 3.915 | 0.576 | 8.855 |
3 | 1.035 | 0.914 | 0.389 | 3.001 |
4 | 0.876 | 0.104 | 0.159 | 0.810 |
5 | 0.853 | 0.002 | 0.023 | 0.102 |
6 | 0.852 | 0.001 | 0.001 | 0.001 |
After six iterations, we see here that the difference between successive values of f(u), and u, as well as the absolute value of f(u), is reduced to 0.001 or less. After six Newton-Raphson iterations starting from u_0=0, the solution has converged to within a tolerance of 0.001. When we solve nonlinear problems, we apply this algorithm until the solution was converged to within the desired tolerance. There is a second termination criterion: that the solver should take no more than a specified number of iterations. Whichever criterion, tolerance, or number of iterations gets satisfied first will stop the solver. Also, keep in mind the discussion from the blog post on solving linear static finite element problems about the numerical scaling of the problem. The tolerance criteria applies to the scaled solution vector — not the absolute values of the solution.
Although it is more complicated to visualize, this is the same algorithm used to solve problems where u is a vector, as is the case for typical nonlinear finite element problems. However, when solving a problem with hundreds, thousands, or even millions of degrees of freedom, it is desirable to take as few Newton-Raphson steps as possible. Recall that we need to solve \mathbf{u}_{i+1}=\mathbf{u}_{i}-[\mathbf{f}'(\mathbf{u}_{i})]^{-1}\mathbf{f}(\mathbf{u}_{i}) and that computing the inverse of the derivative is the most computationally intensive step. To avoid proceeding into a region where there is no solution, and to minimize the number of Newton-Raphson steps taken, COMSOL uses a damping factor. Consider again the first Newton-Raphson step plotted earlier, and observe that for this step |\mathbf{f}(\mathbf{u}_{i+1})|>|\mathbf{f}(\mathbf{u}_{i})|. So for this iteration, we have taken too large of a step. When this happens, COMSOL will perform a simple search along the interval [\mathbf{u}_{i},\mathbf{u}_{i+1}] for a point \mathbf{u}_{damped}=\mathbf{u}_i+\alpha(\mathbf{u}_{i+1}-\mathbf{u}_i) such that |\mathbf{f(u}_{damped})|<|\mathbf{f(u}_{i})|. The Newton-Raphson iteration scheme is then restarted at this point.
The term \alpha is known as the damping factor and has bounds 0< \alpha \le 1. As \alpha \rightarrow 0 we say that the damping is increased, while \alpha = 1 means that the problem is undamped. This method is attractive because the search requires only that COMSOL evaluates \mathbf{f(u}_{damped}) and the computational cost of this is quite low as compared to computing the derivative \mathbf{f'(u}_{i}) and its inverse [\mathbf{f}'(\mathbf{u}_i)]^\mathbf{-1}.
It is important to emphasize that this damping term has no direct physical interpretation. Although this method works quite well to improve convergence, there is very little physical insight that can be gleaned by examining the damping factor. Furthermore, although COMSOL does allow you to manually modify the damping factor, it is not generally possible to use any physical understanding or information from the model as guidance when doing so. The default choice of damping algorithm is difficult to outperform through manual intervention. However, there are other techniques that can be used, which are usually motivated by the physics of the problem, that work well when the default damped Newton-Raphson methods converge slowly or not at all.
Nonlinear problems are inherently difficult to solve since there are multiple ways in which the above solution procedure can fail to converge. Although there are many ways in which the Newton-Raphson method can fail, in practice we can reduce the discussion to the following cases.
First, consider the same nonlinear problem as before, but with a different starting point, for example, u_0=-2. As we can see from the plot below, if we choose any initial condition u_0\le-1, the Newton-Raphson method cannot find a solution since the derivatives of f(u) do not point towards the solution. There is no solution to be found to the left of u_0=-1, so these starting points are outside of the radius of convergence of the Newton-Raphson method. The choice of initial condition can cause the Newton-Raphson method to fail to converge, even if a solution exists. So, unlike the linear case, where a well-posed problem will always solve, the convergence of nonlinear models may be highly dependent on the choice of starting condition. We will address later how best to choose a good initial condition.
The nonlinear solver will also fail if the problem itself does not have a solution. Consider again the problem from above, but with a spring stiffness of k(u)=\exp(-u). In other words, as the spring gets stretched, the stiffness decreases. If we plot out f(u) for a load of p=2, we see that there is no solution to be found. Unfortunately, the Newton-Raphson algorithm cannot determine that this is the case; the algorithm will simply fail to find a solution and terminate after a user-specifiable number of iterations.
Last, consider the case of a material property that has a discontinuous change in properties. For example, consider the same system as before, but with a spring stiffness that has different values over different intervals, a value of k=0.5 for u\le1.8, a value of k=1 for 1.8<u<2.2, and k=1.5 for u\ge2.2. If we plot out f(u) for this case we see that it is non-differentiable and discontinuous, which is a violation of the requirements of the Newton-Raphson method. It is also clear by examination that unless we choose a starting point in the interval 1.8<u<2.2 the Newton-Raphson iterations will oscillate between iterations outside of this interval.
To summarize, so far we have introduced the damped Newton-Raphson method used to solve nonlinear finite element problems and discussed the convergence criteria used. We introduced several ways in which this method can fail to find a solution, including:
We will soon discuss ways of addressing all of these issues, but first, let’s take a look at the log file of a typical nonlinear finite element problem. Below you will see the log file (with line numbers added) from a geometric nonlinear structural mechanics problem:
1) Stationary Solver 1 in Solver 1 started at 10-Jul-2013 15:23:07. 2) Nonlinear solver 3) Number of degrees of freedom solved for: 2002. 4) Symmetric matrices found. 5) Scales for dependent variables: 6) Displacement field (Material) (mod1.u): 1 7) Iter ErrEst Damping Stepsize #Res #Jac #Sol 8) 1 6.1 0.1112155 7 3 1 3 9) 2 0.12 0.6051934 1.2 4 2 5 10) 3 0.045 1.0000000 0.18 5 3 7 11) 4 0.012 1.0000000 0.075 6 4 9 12) 5 0.0012 1.0000000 0.018 7 5 11 13) 6 1.6e-005 1.0000000 0.0015 8 6 13 14) Stationary Solver 1 in Solver 1: Solution time: 1 s 15) Physical memory: 849 MB 16) Virtual memory: 946 MB
Now you should have gained an understanding of how nonlinear static problems are solved in COMSOL as well as how to interpret the log file.
]]>
Let’s consider a linear static finite element problem composed of three nodes and three elements:
Each element is bounded by two nodes. One of the nodes is at the rigid wall, where we know the displacement will be zero, so we do not need to solve for that node. As we saw in the earlier blog post on linear static finite element problems, we can write a balance of forces for each node:
and we can write this as:
or even more compactly as:
We can solve this problem using the Newton-Raphson iteration method, and since this is a linear static problem, we can solve it in one iteration and using an initial value of \mathbf{u}_{init}=\mathbf{0}, giving us this solution:
Now, this problem only has two unknowns, or degrees of freedom (DOF), and can easily be solved with pen and paper. But in general, your matrices will have thousands to millions of DOF’s, and finding the solution to the above equation is usually the most computationally demanding part of the problem. When solving such a system of linear equations on a computer, one should also be aware of the concept of a condition number, a measure of how sensitive the solution is to a change in the load. Although COMSOL never directly computes the condition number (it is as expensive to do so as solving the problem) we do speak of the condition number in relative terms. This number comes into play with the numerical methods used to solve systems of linear equations.
There are two fundamental classes of algorithms that are used to solve for \bf{K^{-1}b}: direct and iterative methods. We will introduce both of these methods and look at their general properties and relative performance, below.
The direct solvers used by COMSOL are the MUMPS, PARDISO, and SPOOLES solvers. All of the solvers are based on LU decomposition.
These solvers will all arrive at the same answer for all well-conditioned finite element problems, which is their biggest advantage, and can even solve some quite ill-conditioned problems. From the point of view of the solution, it is irrelevant which one of the direct solvers you choose, as they will return the same solution. The direct solvers differ primarily in their relative speed. The MUMPS, PARDISO, and SPOOLES solvers can each take advantage of all of the processor cores on a single machine, but PARDISO tends to be the fastest and SPOOLES the slowest. SPOOLES also tends to use the least memory of all of the direct solvers. All of the direct solvers do require a lot of RAM, but MUMPS and PARDISO can store the solution out-of-core, which means that they can offload some of the problem onto the hard disk. The MUMPS solver also supports cluster computing, allowing you to use more memory than is typically available on any single machine.
If you are solving a problem that does not have a solution (such as a structural problem with loads, but without constraints) then the direct solvers will still attempt to solve the problem, but will return an error message that looks similar to:
Failed to find a solution. The relative residual (0.06) is greater than the relative tolerance. Returned solution is not converged.
If you get this type of error message, then you should check to make sure that your problem is correctly constrained.
The iterative solvers in COMSOL encompass a variety of approaches, but they are all conceptually quite simple to understand at their highest level, being essentially similar to a conjugate gradient method. Other variations include the generalized minimum residual method and the biconjugate gradient stabilized method, and there are many variations on these, but they all behave similarly.
Contrary to direct solvers, iterative methods approach the solution gradually, rather than in one large computational step. Therefore, when solving a problem with an iterative method, you can observe the error estimate in the solution decrease with the number of iterations. For well-conditioned problems, this convergence should be quite monotonic. If you are working on problems that are not as well-conditioned, then the convergence will be slower. Oscillatory behavior of an iterative solver is often an indication that the problem is not properly set up, such as when the problem is not sufficiently constrained. A typical convergence graph for an iterative solver is shown below:
By default, the model is considered converged when the estimated error in the iterative solver is below 10^{-3}. This is controlled in the Solver Settings window:
This tolerance can be made looser, for faster solutions, or tighter, for greater accuracy on the current mesh. The tolerance must always be greater than a number that depends on the machine precision (2.22×10^{-16}) and the condition number (which is problem dependent). However there is usually no point in making the tolerance too tight since the inputs to your model, such as material properties, are often not accurate to more than a couple of digits. If you are going to change the relative tolerance, we generally recommend making the tolerance tighter in increments of one order of magnitude and comparing solutions. Keep in mind that you are only solving to a tighter tolerance on the mesh that you are currently using, and it is often more reasonable to refine the mesh.
The big advantage of the iterative solvers is their memory usage, which is significantly less than a direct solver for the same sized problems. The big disadvantage of the iterative solvers is that they do not always “just work”. Different physics do require different iterative solver settings, depending on the nature of the governing equation being solved.
Luckily, COMSOL already has built-in default solver settings for all predefined physics interfaces. COMSOL will automatically detect the physics being solved as well as the problem size, and choose the solver — direct or iterative — for that case. The default iterative solvers are chosen for the highest degree of robustness and lowest memory usage, and do not require any interactions from the user to set them up.
When solving the systems of linear equations of a simulation, COMSOL will automatically detect the best solver without requiring any user interaction. The direct solvers will use more memory than the iterative solvers, but can be more robust. Iterative solvers approach the solution gradually, and it is possible to change the convergence tolerance, if desired.
]]>
As we saw earlier, there are four different 3D element types — tets, bricks, prisms, and pyramids:
These four elements can be used, in various combinations, to mesh any 3D model. (For 2D models, you have triangular and quadrilateral elements available. We won’t discuss 2D very much here, since it is a logical subset of 3D that doesn’t require much extra explanation.) What we haven’t spoken in-depth about yet is why you would want to use these various elements.
Tetrahedral elements are the default element type for most physics within COMSOL. Tetrahedra are also known as a simplex, which simply means that any 3D volume, regardless of shape or topology, can be meshed with tets. They are also the only kind of elements that can be used with adaptive mesh refinement. For these reasons, tets can usually be your first choice.
The other three element types (bricks, prisms, and pyramids) should be used only when it is motivated to do so. It is first worth noting that these elements will not always be able to mesh a particular geometry. The meshing algorithm usually requires some more user input to create such a mesh, so before going through this effort, you need to ask yourself if it is motivated. Here we will talk about the motivations behind using brick and prism elements. The pyramids are only used when creating a transition in the mesh between bricks and tets.
The primary motivation in COMSOL for using brick and prism elements is that they can significantly reduce the number of elements in the mesh. These elements can have very high aspect ratios (the ratio of longest to shortest edge) whereas the algorithm used to create a tet mesh will try to keep the aspect ratio close to unity. It is reasonable to use high aspect ratio brick and prism elements when you know that the solution varies gradually in certain directions, or if you are not very interested in accurate results in those regions because you already know the interesting results are elsewhere in the model.
Consider the example of a wheel rim, shown below.
The mesh on the left is composed only of tets, while the mesh on the right has tets (green), bricks (blue), and prisms (pink) as well as pyramids to transition between these. The mixed mesh uses smaller tets around the holes and corners, where we expect higher stresses. Bricks and prisms are used in the spokes and around the rim. Neither the rim nor the spokes will carry peak stresses (at least under a static load) and we can safely assume a relatively slow variation of the stresses in these regions. The tet mesh has about 145,000 elements and around 730,000 degrees of freedom. The mixed mesh has close to 78,000 elements and roughly 414,000 degrees of freedom, and takes about half as much time and memory to solve. The mixed mesh does take significant user interaction to set up, while the tet mesh requires essentially no user effort.
Another example is shown below, this time it’s a structural analysis of a loaded spring. Since the deformation is quite uniform along the length of the helix of the spring, it makes sense to have a mesh that describes the overall shape and cross section, but relatively stretched elements along the length of the wire. The prism mesh has 504 elements with 9,526 degrees of freedom, and the tet mesh has 3,652 elements with 23,434 degrees of freedom. So although the number of elements is quite different, the number of degrees of freedom is less so.
The other significant motivation for using brick and prism elements is when the geometry contains very thin structures in one direction, such as an epitaxial layer of material on a wafer, a stamped sheet metal part, or a sandwiched composite.
For example, let’s look at the figure below, of a thin trace of material patterned onto a substrate. The tet mesh has very small elements in the trace, whereas the prism mesh is composed of thin elements in this region. Whenever your geometry has layers that are about 10^{-3} or so times thinner than the largest dimension of the part, the usage of bricks and prisms becomes very highly motivated.
It is also worth pointing out that COMSOL offers many boundary conditions that can be used in lieu of explicitly modeling thin layers of materials. For example, in electromagnetics, the following four examples consider thin layers of material with relatively high and low conductivity, and relatively high and low permeability:
Similar types of boundary conditions exist in most of the physics interfaces. Usage of these types of boundary conditions will avoid the need to mesh such thin layers entirely.
Lastly, the above comments apply only to linear static finite element problems. Different meshing techniques are needed for nonlinear static problems, or if we are modeling time-domain or frequency-domain phenomena.
To summarize, here is what you should keep in mind when starting your meshing of linear static problems:
Let’s look at the problem of a flat plate under uniaxial tension with a square hole cut in it. This is similar to the example from earlier in that we can exploit symmetry and only model one quarter of the structure.
As before, we can use adaptive mesh refinement to let COMSOL insert more elements into regions where the error is estimated to be large:
We observe that smaller and smaller elements are being inserted at the inside corner of the sharp hole. Let’s also plot out the stresses at this inside corner as a function of the mesh size:
From this plot, it appears that the stresses are getting larger and larger, no matter how fine we make the mesh. In fact, that is exactly what is occurring here; the stresses at the sharp corner are non-convergent with respect to mesh refinement because we have a singularity in the model. This is actually completely accurate — stresses in sharp corners are theoretically infinite. Whenever you see this kind of non-convergent behavior, it is likely that you are looking at the manifestation of a singularity in your model.
In structural engineering practice, sharp inside corners are something to avoid. You would be justified in saying that one way of preventing this problem is to round off the sharp corners of the model where these singularities appear. Doing so would lead to a model that predicts stresses that converge with mesh refinement, but it would still need to have a lot of elements in this inside corner. So let’s introduce other approaches to dealing with these singularities.
One approach to dealing with these singularities is to just ignore them. An important feature to understand about the finite element method is that it allows local inaccuracies, since it is formulated in a way that minimizes the global error in the model. The stresses we are predicting in the above model are incorrect, but if you evaluate the stresses at a distance of around 2-3 mesh elements away from the singularity, the stress solution there does converge. Thereby, if we’re interested in the stresses away from the singularity, the mere presence of the singularity does not pollute the predictions elsewhere.
It is also important to realize that the singularities manifest when you are taking the derivative of the solution field. In structural mechanics, we solve for the displacement field, \bf{u}, and compute the stresses from the strains, \bf{\sigma =C : \epsilon}, where strain is defined in terms of the gradient of the displacement field: \epsilon = 1/2 \bf{ [ (\nabla u)^T + \nabla u] }. When you think of the stress as being the gradient of the displacement field, it also becomes a little bit clearer why the solution for the stresses goes to infinity at a sharp corner. However, if you are only interested in the solution field, \bf{u}, this is not singular, even at the sharp corner, and does converge with mesh refinement.
There is one more common case when a singularity is acceptable: if you are only interested in an integral quantity as output of your model. For instance, the total elastic strain energy of a system composed of a linear material is:
If we evaluate this for a domain that includes a singularity, such as the plate with the square hole, this integral will converge rapidly with mesh refinement, even though the integrands are non-convergent at point(s) inside the domain. So if the only quantity that you want to get out is a function of an integral over a domain (or boundary) within your model, then your model can include singularities. The quantity that you are integrating will converge to the same value if you use a sharp or a round corner. This situation arises especially often in electromagnetics, where device inductance and capacitance are both evaluated as integrals of electric and magnetic fields over the domains.
To summarize, there are three common situations when it is acceptable to have a singularity in the model:
In these cases, you will observe convergence of your solution with mesh refinement. That said, you should still be careful when you observe non-convergent behavior anywhere in your model, and make sure that it is not skewing your interpretation of the results.
Finally, there are cases where we do need to accurately compute the fields at these singularities, but our model may be so large that we don’t want to put fillets on all of these edges. In such cases, we can use a strategy called submodeling, or break-out modeling. This approach uses a relatively coarse mesh to find the solution field on a larger model that may contain singularities, and then passes this information on to a submodel that has a much finer mesh and rounded corners. This approach is presented in the Submodel in a Wheel Rim example.
]]>