This post is an expanded follow on comment for Roger Sparks in this thread.
In particular, for an LTI system Ad=eAcTs≈I+AcTs (the approximation holding when the matrix norm ‖AcTs‖=‖Ac‖Ts≪1, ... a generalization of the approximation you may recall from this comment I left you on John Handley's blog). The subscripts d and c denote discrete and continuous time, respectively. Note that Ad is different for every sample period Ts used. Bd is generally more complicated, but can be approximated as Bd≈A−1c(Ad−I)Bc. This approximation is exact when u is a "zero order hold" (ZOH) function, meaning it stays constant over each sample period. More generally, for LTI systems with an arbitrary input u you must integrate:
xn+1=eAcTsxn+Ts∫0eAc(Ts−τ)Bcu(τ+nTs)dτ
and substituting for time-invariant Ad from above:
xn+1=Adxn+Ts∫0eAc(Ts−τ)Bcu(τ+nTs)dτ
or more generally still, for a time-varying linear system:
xn+1=Φ(n+1)Ts,nTsxn+(n+1)Ts∫nTsΦ(n+1)Ts,τBc(τ)u(τ)dτ
and now substituting for a time varying version of the Ad matrix (indexed by the sample period number n):
xn+1=Adnxn+(n+1)Ts∫nTsΦ(n+1)Ts,τBc(τ)u(τ)dτ
where Φt2,t1 is the state transition matrix from time t1 to time t2. Or you can, in principal, avoid the integration by doing a Taylor expansion on u and find a separate Bdi for each ith term. As Brian has previously pointed out, a numerical integration accomplishes much the same thing, only instead of Taylor terms sampled at one time, it uses u sampled at several times over a single nominal sample period.
Additionally, as I mentioned in a comment above, some non-linear dynamic systems can be approximated over each sample period as a linear system. For example, you may have a non-linear system such as ˙x=f(x,u,t), where f,x and u are generally vectors. One way to accomplish this is to approximate f as Ac and Bc where these matrices are the Jacobians of f wrt x (i.e. Ac=∂f/∂x) and u (i.e. Bc=∂f/∂u) respectively, and to rewrite the differential equation in terms of a small perturbation of x (call it δx) about a nominal solution x∗, i.e. δx≡x−x∗. Similarly define δu≡u−u∗. Then we can write for an approximate perturbation to this non-linear differential equation δ˙x=Acδx+Bcδu. The discrete time approximation I gave above (Ad≈I+AcTs) can be seen in this light as a two term vector Taylor expansion of f wrt x, i.e. Ad≈I+(∂f/∂x)Ts. Note that the "region of validity" for Ac and Bc may be small, and typically they are recalculated around different nominal values x∗, u∗ and time t as the system evolves in time. An application might be, for example, a model of perturbations of a rocket about a nominal desired trajectory for purposes of implementing a feedback control system to keep it on course. As long as it never is allowed to wander too far off course, the approximation may be adequate, provided the system is "re-linearized" on a regular basis. This might suggest an economics application of a central bank "feedback control" rule for setting interest rates or determining the "high powered" money supply to "keep the economy on course" about some non-linear pre-determined solution to a set of non-linear differential equations, based on feedback from other measured macro variables, such as the inflation rate, the NGDP level or the unemployment rate.
Another huge advantage of linear systems is the ability to analyze and manipulate them in the "frequency domain" rather than the time domain analysis I give above. This means the ability to use continuous and discrete time Fourier transforms, Laplace transforms and Z-transforms. They also have advantages in terms of being able to analyze their response to noise, modeling errors and other uncertainties, and to design optimal "state estimators" (e.g. Kalman filters) perhaps designed to work in conjunction with optimal feedback or open loop control laws, which can effectively change the natural modes of the system to more desirable ones: for example, to stabilize an unstable system by moving its unstable eigenvalues to the "right half plane" (i.e. move modes with positive real components such that they instead have negative real components... in what's known with regard to Laplace transforms as the "S-plane"). The discrete time equivalent is to move unstable eigenvalues of Ad from outside the "unit circle" to inside the unit circle (this unit circle drawn in the "Z-plane" associated with the Z-transform, which is the discrete time counterpart to the Laplace transform). Sometimes you hear economists speak of "unit roots." What they're referring to are "roots" (in the sense of roots to the "characteristic polynomial" of Ad, which are the eigenvalues, AKA "poles" of Ad) on the unit circle (corresponding to the imaginary axis of the S-plane in the continuous time world), which produces "marginally stable" outputs, such as non-rising non-decaying step impulse responses (the response of a perfect "integrator") or constant amplitude oscillations (sin and cos waves, or more generally "complex exponentials," i.e. ceiωnTs with i≡√−1, c a constant and ω the unitless frequency of oscillation (which you can practically take to be radians per sample)). Everything in quotes can be Googled for more information (those are all common terms).
I think you can probably see how important the A matrix (Ac or Ad) is to a linear system. It characterizes all the dynamics of the system, such as whether or not it's naturally stable or likely to oscillate, how fast it will respond to inputs (the time constants), and which modes are dominant (i.e. which modes are weighted with the largest coefficients). Of course the other matrices (B, C and D) are also important, and you need them, for example, to analyze steady state responses or to determine if a system is "controllable" (can we control the states?) or "observable" (do we have enough sensors to observe all the states?). But A is really the heart of the system.
-----------------------------------------------------------------------------------------------------------------
Appendix A: Details about one way to calculate discrete time matrices for non-linear systems
For the non-linear approximate discrete time case, it's sometimes beneficial to numerically calculate Ad and Bd from numerical solutions to the non-linear equations directly rather than via the partial differential Jacobian technique discussed above, especially if you have some idea of the covariance of δx (call it Cx) and δu (call it Cu). For example, say the eigenvalues (variances) of Cx are {σ2x,i}, for i=1,...,n (assuming x is n-dimensional), and the corresponding orthonormal matrix of eigenvectors is Vx=[vx,i]. These eigenvectors are orthonormal since covariances are always positive semi-definite, and we'll assume positive definite (i.e. a full n degrees of freedom (equivalently all variances are > 0)) for this exercise. Then a good way to calculate Adn which captures some of the effects of the higher order terms of the Taylor expansion as well as the dependence of f on t (which we've so far assumed to be negligible over a sample period) is the following sum of outer products:
Adn=n∑i=1[xi(tn+1)−x∗(tn+1)]⋅vTx,i/σx,i
where tm≡mTs and xi(tn+1) is the numerical solution at time tn+1 to the non-linear differential equation after perturbing the nominal solution x∗(tn) by vx,iσx,i and propagating it forward by one sample period Ts to time tn+1. In other words we form xi(tn)=x∗(tn)+δxi,n where δxi,n=vx,iσx,i and propagate that forward one time step to get xi(tn+1). In general Cx and Cu will also be functions of time, and thus so will their eigenvalues and eigenvectors (though I've neglected a time index for them in the above). An analogous procedure exists for finding Bdn using a formula similar to (11) but with the eigenvalues and eigenvectors of Cu rather than Cx. This procedure (for both Ad and Bd) scales the non-linear effects by the size of the standard deviations (σ) of each degree of freedom in each covariance, thus in some sense capturing the expected size of the effects of higher order terms / non-linearities. Note that (11) is a one-sided difference, whereas a more robust method would use both positive and negative perturbations (±δ).
One way to picture what's going on in (11) is to imagine that Cx describes a multivariate spatial Gaussian probability distribution that's long and narrow (i.e. its 1-σ ellipsoidal iso-probability density contour is cigar shaped). Then a perturbation in x from the nominal is most likely to be located ~1σ one direction or the other along the major axis of this distribution (with x∗ right in the middle of course). For the sake of argument, say it's exactly 1σ along the positive major axis (it actually works for any principal axis of the ellipsoid, but imagine the major one for the sake of argument), and it has no perturbation components along the other axes. Then the inner product of this perturbation and the row vector vTx,j/σx,j will be 1 and it will be 0 for vTx,i/σx,i,i≠j where we're assuming the major axis is the jth. That's because Vx is orthornormal (i.e. its columns are unit length and mutually orthogonal). Such a perturbation will give an exact result for the resultant perturbation propagated to time tn+1 (assuming δu=0). In other words, such a perturbation at time tn (call it δxj(tn)) will give an exact result for δxj(tn+1)=Adnδxj(tn). This is because, substituting (11) for Adn we have:
Adnδxj(tn)=(n∑i=1[xi(tn+1)−x∗(tn+1)]⋅vTx,i,n/σx,i,n)⋅δxi(tn)=(n∑i=1[xi(tn+1)−x∗(tn+1)]⋅vTx,i,n/σx,i,n)⋅vx,j,nσx,j,n=n∑i=1[xi(tn+1)−x∗(tn+1)](vTx,i,n⋅vx,j,n)(σx,j,n/σx,i,n)=[xj(tn+1)−x∗(tn+1)](vTx,j,n⋅vx,j,n)(σx,j,n/σx,j,n)=xj(tn+1)−x∗(tn+1)=δxj(tn+1)
where I've added time indices where appropriate. Thus, in some sense, a "likely" perturbation gives exact results, non-linearities and all. The Jacobian method does not capture this, and may be equally difficult to calculate (although when closed form expressions are available, it's likely easier) [1]. Of course, this exact "likely" perturbation still has probability = 0, so we don't expect to actually avoid non-linear distortions altogether, but in some sense we can expect to account for them in a way not captured by the Jacobian method. Doing so is perhaps a bit conservative, but if our purpose is (for example) to keep track of how errors propagate through the system (e.g. with a Kalman filter), it's generally better to err on the conservative side, lest we fool ourselves into thinking we have a better state estimate than we actually do (which can cause us to weight new information ("innovations") with smaller gains than we should at future times). You can think of this problem as wrongly adopting overly strong "priors." If a more conservative result is desired, the perturbations can be made to be greater than 1-σ in size.
[1] In fact, you can think of this method as a crude numerical estimate of e(∂f/∂x)Ts, where the ∂x deviations (and thus the implicit ∂f deviations) aren't very infinitesimal at all, and where Ts is implicitly folded into the differencing estimate directly rather than as a separate multiplier. But the point isn't to approximate the matrix exponential, but to find something better. However, if the system were actually LTI, it would be an exact representation of e(∂f/∂x)Ts. And if the system were linear but time varying, it would produce exactly Φn+1,n where I've simplified the time indices from (10) above.
-----------------------------------------------------------------
See comment below:
Below: Example of error propagation (falling objects with initial horizontal velocity): blue curve includes air resistance, and the red curve doesn't. Blue trajectory is non-linear and red is linear. Error ellipses are 1-sigma.