This post is an expanded follow on comment for Roger Sparks in this thread.
In particular, for an LTI system $A_d = e^{A_c T_s} \approx I + A_c T_s$ (the approximation holding when the matrix norm $\lVert A_c T_s \rVert = \lVert A_c \rVert T_s \ll 1$, ... a generalization of the approximation you may recall from this comment I left you on John Handley's blog). The subscripts $d$ and $c$ denote discrete and continuous time, respectively. Note that $A_d$ is different for every sample period $T_s$ used. $B_d$ is generally more complicated, but can be approximated as $B_d \approx A^{-1}_c (A_d - I) B_c$. This approximation is exact when $\mathbf{u}$ is a "zero order hold" (ZOH) function, meaning it stays constant over each sample period. More generally, for LTI systems with an arbitrary input $\mathbf{u}$ you must integrate:
$$
\mathbf{x}_{n+1} = e^{A_c T_s} \mathbf{x}_{n} + \int\limits_{0}^{T_s} e^{A_c (T_s - \tau)} B_c\, \mathbf{u}(\tau + n T_s) d \tau \tag 9
$$
and substituting for time-invariant $A_d$ from above:
$$
\mathbf{x}_{n+1} = A_d \mathbf{x}_{n} + \int\limits_{0}^{T_s} e^{A_c (T_s - \tau)} B_c\, \mathbf{u}(\tau + n T_s) d \tau \tag {9a}
$$
or more generally still, for a time-varying linear system:
$$
\mathbf{x}_{n+1} = \Phi_{(n+1) T_s, n T_s} \mathbf{x}_{n} + \int\limits_{n T_s}^{(n+1) T_s} \Phi_{(n+1) T_s, \tau} B_c (\tau) \mathbf{u}(\tau) d \tau \tag {10}
$$
and now substituting for a time varying version of the $A_d$ matrix (indexed by the sample period number $n$):
$$
\mathbf{x}_{n+1} = A_{dn} \mathbf{x}_{n} + \int\limits_{n T_s}^{(n+1) T_s} \Phi_{(n+1) T_s, \tau} B_c (\tau) \mathbf{u}(\tau) d \tau \tag {10a}
$$
where $\Phi_{t_2, t_1}$ is the state transition matrix from time $t_1$ to time $t_2$. Or you can, in principal, avoid the integration by doing a Taylor expansion on $\mathbf{u}$ and find a separate $B_{di}$ for each $i^{th}$ term. As Brian has previously pointed out, a numerical integration accomplishes much the same thing, only instead of Taylor terms sampled at one time, it uses $\mathbf{u}$ sampled at several times over a single nominal sample period.
Additionally, as I mentioned in a comment above, some non-linear dynamic systems can be approximated over each sample period as a linear system. For example, you may have a non-linear system such as $\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x},\mathbf{u} ,t$), where $\mathbf{f}, \mathbf{x}$ and $\mathbf{u}$ are generally vectors. One way to accomplish this is to approximate $\mathbf{f}$ as $A_c$ and $B_c$ where these matrices are the Jacobians of $\mathbf{f}$ wrt $\mathbf{x}$ (i.e. $A_c = \partial \mathbf{f}/\partial\mathbf{x}$) and $\mathbf{u}$ (i.e. $B_c = \partial \mathbf{f}/\partial\mathbf{u}$) respectively, and to rewrite the differential equation in terms of a small perturbation of $\mathbf{x}$ (call it $\delta{\mathbf{x}}$) about a nominal solution $\mathbf{x}^*$, i.e. $\delta{\mathbf{x}} \equiv \mathbf{x} - \mathbf{x}^*$. Similarly define $\delta{\mathbf{u}} \equiv \mathbf{u} - \mathbf{u}^*$. Then we can write for an approximate perturbation to this non-linear differential equation $\delta{\dot{\mathbf{x}}} = A_c \delta\mathbf{x} + B_c \delta\mathbf{u}$. The discrete time approximation I gave above ($A_d \approx I + A_c T_s$) can be seen in this light as a two term vector Taylor expansion of $\mathbf{f}$ wrt $\mathbf{x}$, i.e. $A_d \approx I + (\partial\mathbf{f}/\partial\mathbf{x}) T_s$. Note that the "region of validity" for $A_c$ and $B_c$ may be small, and typically they are recalculated around different nominal values $\mathbf{x}^*$, $\mathbf{u}^*$ and time $t$ as the system evolves in time. An application might be, for example, a model of perturbations of a rocket about a nominal desired trajectory for purposes of implementing a feedback control system to keep it on course. As long as it never is allowed to wander too far off course, the approximation may be adequate, provided the system is "re-linearized" on a regular basis. This might suggest an economics application of a central bank "feedback control" rule for setting interest rates or determining the "high powered" money supply to "keep the economy on course" about some non-linear pre-determined solution to a set of non-linear differential equations, based on feedback from other measured macro variables, such as the inflation rate, the NGDP level or the unemployment rate.
Another huge advantage of linear systems is the ability to analyze and manipulate them in the "frequency domain" rather than the time domain analysis I give above. This means the ability to use continuous and discrete time Fourier transforms, Laplace transforms and Z-transforms. They also have advantages in terms of being able to analyze their response to noise, modeling errors and other uncertainties, and to design optimal "state estimators" (e.g. Kalman filters) perhaps designed to work in conjunction with optimal feedback or open loop control laws, which can effectively change the natural modes of the system to more desirable ones: for example, to stabilize an unstable system by moving its unstable eigenvalues to the "right half plane" (i.e. move modes with positive real components such that they instead have negative real components... in what's known with regard to Laplace transforms as the "S-plane"). The discrete time equivalent is to move unstable eigenvalues of $A_d$ from outside the "unit circle" to inside the unit circle (this unit circle drawn in the "Z-plane" associated with the Z-transform, which is the discrete time counterpart to the Laplace transform). Sometimes you hear economists speak of "unit roots." What they're referring to are "roots" (in the sense of roots to the "characteristic polynomial" of $A_d$, which are the eigenvalues, AKA "poles" of $A_d$) on the unit circle (corresponding to the imaginary axis of the S-plane in the continuous time world), which produces "marginally stable" outputs, such as non-rising non-decaying step impulse responses (the response of a perfect "integrator") or constant amplitude oscillations (sin and cos waves, or more generally "complex exponentials," i.e. $c\, e^{i \omega n T_s}$ with $i \equiv \sqrt{-1}$, $c$ a constant and $\omega$ the unitless frequency of oscillation (which you can practically take to be radians per sample)). Everything in quotes can be Googled for more information (those are all common terms).
I think you can probably see how important the $A$ matrix ($A_c$ or $A_d$) is to a linear system. It characterizes all the dynamics of the system, such as whether or not it's naturally stable or likely to oscillate, how fast it will respond to inputs (the time constants), and which modes are dominant (i.e. which modes are weighted with the largest coefficients). Of course the other matrices ($B$, $C$ and $D$) are also important, and you need them, for example, to analyze steady state responses or to determine if a system is "controllable" (can we control the states?) or "observable" (do we have enough sensors to observe all the states?). But $A$ is really the heart of the system.
-----------------------------------------------------------------------------------------------------------------
Appendix A: Details about one way to calculate discrete time matrices for non-linear systems
For the non-linear approximate discrete time case, it's sometimes beneficial to numerically calculate $A_d$ and $B_d$ from numerical solutions to the non-linear equations directly rather than via the partial differential Jacobian technique discussed above, especially if you have some idea of the covariance of $\delta \mathbf{x}$ (call it $C_\mathbf{x}$) and $\delta \mathbf{u}$ (call it $C_\mathbf{u}$). For example, say the eigenvalues (variances) of $C_\mathbf{x}$ are $\{\sigma^2_{\mathbf{x}, i}\}$, for $i = 1,...,n$ (assuming $\mathbf{x}$ is $n$-dimensional), and the corresponding orthonormal matrix of eigenvectors is $V_\mathbf{x} = [\mathbf{v}_{\mathbf{x}, i}]$. These eigenvectors are orthonormal since covariances are always positive semi-definite, and we'll assume positive definite (i.e. a full $n$ degrees of freedom (equivalently all variances are > 0)) for this exercise. Then a good way to calculate $A_{d n}$ which captures some of the effects of the higher order terms of the Taylor expansion as well as the dependence of $\mathbf{f}$ on $t$ (which we've so far assumed to be negligible over a sample period) is the following sum of outer products:
$$
A_{d n} = \sum_{i=1}^n [\mathbf{x}_i (t_{n+1}) - \mathbf{x}^* (t_{n+1})] \cdot \mathbf{v}^T_{\mathbf{x}, i} /\sigma_{\mathbf{x}, i} \tag {11}
$$
where $t_{m} \equiv m T_s$ and $\mathbf{x}_i (t_{n+1})$ is the numerical solution at time $t_{n+1}$ to the non-linear differential equation after perturbing the nominal solution $\mathbf{x}^* (t_{n})$ by $\mathbf{v}_{\mathbf{x}, i} \sigma_{\mathbf{x}, i}$ and propagating it forward by one sample period $T_s$ to time $t_{n+1}$. In other words we form $\mathbf{x}_i (t_n) = \mathbf{x}^* (t_{n}) + \delta \mathbf{x}_{i, n}$ where $\delta \mathbf{x}_{i, n} = \mathbf{v}_{\mathbf{x}, i} \sigma_{\mathbf{x}, i}$ and propagate that forward one time step to get $\mathbf{x}_i (t_{n+1})$. In general $C_\mathbf{x}$ and $C_\mathbf{u}$ will also be functions of time, and thus so will their eigenvalues and eigenvectors (though I've neglected a time index for them in the above). An analogous procedure exists for finding $B_{d n}$ using a formula similar to $(11)$ but with the eigenvalues and eigenvectors of $C_\mathbf{u}$ rather than $C_\mathbf{x}$. This procedure (for both $A_d$ and $B_d$) scales the non-linear effects by the size of the standard deviations ($\sigma$) of each degree of freedom in each covariance, thus in some sense capturing the expected size of the effects of higher order terms / non-linearities. Note that $(11)$ is a one-sided difference, whereas a more robust method would use both positive and negative perturbations ($\pm \delta$).
One way to picture what's going on in $(11)$ is to imagine that $C_\mathbf{x}$ describes a multivariate spatial Gaussian probability distribution that's long and narrow (i.e. its 1-$\sigma$ ellipsoidal iso-probability density contour is cigar shaped). Then a perturbation in $\mathbf{x}$ from the nominal is most likely to be located ~$1 \sigma$ one direction or the other along the major axis of this distribution (with $\mathbf{x}^*$ right in the middle of course). For the sake of argument, say it's exactly $1 \sigma$ along the positive major axis (it actually works for any principal axis of the ellipsoid, but imagine the major one for the sake of argument), and it has no perturbation components along the other axes. Then the inner product of this perturbation and the row vector $\mathbf{v}^T_{\mathbf{x}, j}/\sigma_{\mathbf{x}, j}$ will be $1$ and it will be $0$ for $\mathbf{v}^T_{\mathbf{x}, i}/\sigma_{\mathbf{x}, i},\; i \ne j$ where we're assuming the major axis is the $j^{th}$. That's because $V_\mathbf{x}$ is orthornormal (i.e. its columns are unit length and mutually orthogonal). Such a perturbation will give an exact result for the resultant perturbation propagated to time $t_{n+1}$ (assuming $\delta \mathbf{u} = 0$). In other words, such a perturbation at time $t_n$ (call it $\delta \mathbf{x}_j (t_n)$) will give an exact result for $\delta \mathbf{x}_j (t_{n+1}) = A_{d n} \delta \mathbf{x}_j (t_n)$. This is because, substituting $(11)$ for $A_{d n}$ we have:
$$
\begin{align}
A_{d n} \delta \mathbf{x}_j (t_n) & = \left( \sum_{i=1}^n [\mathbf{x}_i (t_{n+1}) - \mathbf{x}^* (t_{n+1})] \cdot \mathbf{v}^T_{\mathbf{x}, i, n} /\sigma_{\mathbf{x}, i, n} \right) \cdot \delta \mathbf{x}_i (t_n)\\
& = \left( \sum_{i=1}^n [\mathbf{x}_i (t_{n+1}) - \mathbf{x}^* (t_{n+1})] \cdot \mathbf{v}^T_{\mathbf{x}, i, n} /\sigma_{\mathbf{x}, i, n} \right) \cdot \mathbf{v}_{\mathbf{x}, j, n} \sigma_{\mathbf{x}, j, n}\\
& = \sum_{i=1}^n [\mathbf{x}_i (t_{n+1}) - \mathbf{x}^* (t_{n+1})] (\mathbf{v}^T_{\mathbf{x}, i, n} \cdot \mathbf{v}_{\mathbf{x}, j, n}) ( \sigma_{\mathbf{x}, j, n} /\sigma_{\mathbf{x}, i, n}) \\
& = [\mathbf{x}_j (t_{n+1}) - \mathbf{x}^* (t_{n+1})] (\mathbf{v}^T_{\mathbf{x}, j, n} \cdot \mathbf{v}_{\mathbf{x}, j, n}) ( \sigma_{\mathbf{x}, j, n}/\sigma_{\mathbf{x}, j, n})\\
& = \mathbf{x}_j (t_{n+1}) - \mathbf{x}^* (t_{n+1})\\
& = \delta \mathbf{x}_j (t_{n+1}) \tag {12}
\end{align}
$$
where I've added time indices where appropriate. Thus, in some sense, a "likely" perturbation gives exact results, non-linearities and all. The Jacobian method does not capture this, and may be equally difficult to calculate (although when closed form expressions are available, it's likely easier) [1]. Of course, this exact "likely" perturbation still has probability = 0, so we don't expect to actually avoid non-linear distortions altogether, but in some sense we can expect to account for them in a way not captured by the Jacobian method. Doing so is perhaps a bit conservative, but if our purpose is (for example) to keep track of how errors propagate through the system (e.g. with a Kalman filter), it's generally better to err on the conservative side, lest we fool ourselves into thinking we have a better state estimate than we actually do (which can cause us to weight new information ("innovations") with smaller gains than we should at future times). You can think of this problem as wrongly adopting overly strong "priors." If a more conservative result is desired, the perturbations can be made to be greater than 1-$\sigma$ in size.
[1] In fact, you can think of this method as a crude numerical estimate of $e^{(\partial\mathbf{f}/\partial\mathbf{x}) T_s}$, where the $\partial \mathbf{x}$ deviations (and thus the implicit $\partial \mathbf{f}$ deviations) aren't very infinitesimal at all, and where $T_s$ is implicitly folded into the differencing estimate directly rather than as a separate multiplier. But the point isn't to approximate the matrix exponential, but to find something better. However, if the system were actually LTI, it would be an exact representation of $e^{(\partial\mathbf{f}/\partial\mathbf{x}) T_s}$. And if the system were linear but time varying, it would produce exactly $\Phi_{n+1, n}$ where I've simplified the time indices from $(10)$ above.
-----------------------------------------------------------------
See comment below:
Below: Example of error propagation (falling objects with initial horizontal velocity): blue curve includes air resistance, and the red curve doesn't. Blue trajectory is non-linear and red is linear. Error ellipses are 1-sigma.