Approximation of Discrete and Orbital Koopman Operators over Subsets and Manifolds

This paper describes a general approach for the construction of approximations of the Koopman operator associated with deterministic continuous semiﬂows over a complete metric space X . A primary contribution of the paper is the derivation of rates of convergence for approximations of Koopman operators for large classes of evolution equations. The rates of convergence in the paper are derived in two substantially diﬀerent scenarios. If the state space X is compact, we consider when the samples Ξ are dense in the entire state space X . We also study cases when samples are dense in a limiting subset Ω ⊂ X that is a proper subset of X . Two general classes of methods are described in the paper, referred to as intrinsic and extrinsic methods. Intrinsic methods deﬁne bases of approximation from kernels that are deﬁned in terms of, or having knowledge of, the limiting set Ω . Extrinsic methods use kernels that do not depend on the knowledge of the limiting set Ω . In both types of approximations, the regularity of the underlying set and the smoothness of the space of functions on which the Koopman operator acts determine the rate of approximation. In the strongest error bounds derived in the paper, it is shown that the error in approximation of the Koopman operator decays like O ( h Ω n ,Ω ) p where h Ω n ,Ω is the ﬁll rate of the samples Ω n in the limiting set Ω and p is an exponent related to the choice of the kernel and the smoothness of functions on which the Koopman operator acts. Such error bounds are obtained when either the limiting subset Ω = X , when it is a proper subset Ω ⊂ X that is suﬃciently regular, or when it is a type of smooth manifold Ω = M .


Introduction
In this paper we derive a family of error estimates for the approximation of certain kinds of Koopman operators that are defined for deterministic semidynamical systems.Koopman operators for semiflows in discrete time U f , as well as orbital Koopman operators for continuous time semiflows U ϕ , are studied.Here f and ϕ are functions that govern the evolution of a discrete dynamical system and the state trajectory generated by a continuous dynamical system, respectively.For each of the two Koopman operators, two different approximations are considered: (i) projection-based approximation, (ii) data-dependent approximation.The principal theoretical results of this paper consist of estimates that make precise in what sense these approximations of the Koopman operators converge to their infinite-dimensional counterparts.
Error bounds are derived for cases when the Koopman operators act on a reproducing kernel Hilbert (RKH) space H X that is embedded in a Sobolev space.The RKH space H X of functions over X = R d is defined in terms of a reproducing kernel K : X × X → R. The Koopman operator approximations are built using finite-dimensional spaces of approximants H Ωn = span{K ξ |ξ ∈ Ω n } with K ξ := K(ξ, •).Here, the term Ω n represents a set of samples collected along the trajectory of the dynamical system.The assumption is that the collection of samples becomes dense in a limiting set Ω as n increases.That is, the set Ξ = ∪ n∈N Ω n is dense in the limiting set Ω ⊆ X.The theorems presented in this work illustrate that the convergence rate depends on (i) the smoothness properties of the native space H X , (ii) the structure of Ω.In particular, we prove that the convergence rate is directly proportional to the smoothness of H X .In addition, convergence results are described such that we get sharper results as the knowledge about the limiting set Ω improves.For example, error bounds are derived that are applicable to samplings collected over portions of a trajectory for the system in Example 1, Figure 1 (a), that has little discernible asymptotic structure.Stronger bounds are derived when the uncertain semiflow exhibits additional long term structure.In Examples 2 and 3 shown in Figures 1 (b) and 1 (c), the semiflow is either supported on or attracted to a smooth manifold.In this latter case, we show that the error in approximations of the Koopman operator decays at a rate that depends on the fill rate of samples in the manifold.The best results are obtained when it so happens that the limiting set Ω is a smooth manifold, and in this case it is shown that the finite-dimensional error in approximating the Koopman operator decreases with O(h p Ωn,Ω ).Here h Ωn,Ω is the fill distance of the samples in the limiting set Ω, and p depends on the smoothness properties of the native space H X .
We introduce the motivation behind Koopman theory in Section 1.1.We review the goals and philosophy of the Koopman operator in Section 1.2.Related work and open questions are reviewed in Section 1.3.The strategy of the overall paper is summarized in Section 1.4.We give an overview of the new results derived in the paper in Section 1.5.The theory developed in the paper begins with a discussion of background material on RKH spaces, Sobolev spaces, and interpolation or projection operators in Section 2. Essential material on dynamical systems theory is contained in Section 3. Sections 4 and 5 give detailed accounts of the development of the newly introduced theory and the main technical results.Numerical studies and conclusions are presented in Sections 6 and 7, respectively.

Motivation
Koopman theory is a body of research that uses operator theory to study dynamical systems.Owing to the amenability to data-driven methodologies that emerged from Koopman theory, it has gathered attention from the researchers across disciplines in the past decade [1][2][3][4][5][6][7].While the original paper by Bernard O. Koopman [8] appeared early in the 1930s, a number of recent texts are evidence of the renewed interest in Koopman operator theoretic approaches.General studies of the Koopman operator can be found in texts on ergodic theory such as [9], and in the popular text [10] on deterministic or stochastic dynamical systems.Just over the past five years careful studies such as  have explored refinements of the theory of approximation of Koopman operators as well as associated data-driven modeling methods for uncertain dynamical systems.These references contain many individual contributions to the theory that underlies the approximation of Koopman operators.In view of the breadth of these studies on Koopman theory, a full and detailed account of the finer points studied in these papers would far exceed the limit of a single paper.Here we only review some of the broader issues and emphasize what we consider to be important open questions that remain, as of yet, largely unresolved.For the sake of researchers across diverse fields, we briefly review Koopman theory and its philosophy next so that this paper is self-contained.Such a review has been carried out in greater length and detail in a number of places, and we recommend discussions such as in [34][35][36][37] for more extensive treatments.

The Philosophy of Koopman Theory
One of the principal reasons for the popularity of methods derived from Koopman theory is that they are amenable to data-driven approaches.In general, these studies build approximations of quantities or mathematical objects associated with an unknown flow from samples or observations.The approximations can take the form of estimates or predictors of the state, estimates of an observable function, or approximations of the propagation law of the dynamical system itself, among other examples.Moreover, it is not only the efficacy of data-driven algorithms that has resulted in such focus on Koopman theory.The theory is generally applicable to, indeed in a sense expressly designed for, the study of nonlinear systems.Koopman theory provides an elegant framework in which to carry out analysis of uncertain nonlinear dynamics as well as to develop data-driven algorithms for modeling and identification of such systems.
Because the general theory is so broadly applicable, a diverse set of definitions of a Koopman operator have appeared in the literature.These definitions are tailored to a class of uncertain dynamical systems under study.Variants are defined for deterministic and stochastic systems, and for both discrete time and continuous time evolutions.A good account of the variety of definitions can be found in [10], and a detailed discussion of the duality structures that accompany some of these representations can be found in [38].Intuition about the nature of Koopman theory, its goals and strategies, is perhaps best achieved by considering a simple model of a discrete time, deterministic, nonlinear system.Such a typical system is governed by the equations where ϕ n is the state at time t n ∈ R + := [0, ∞) in the state space X := R d , f : X → X is an unknown, generally nonlinear function that determines the dynamics, the function h : X → R m is an observable or output function, and y = h(x) is a measurement or observation evaluated at the state x ∈ X.Because it is normally the case that the function f is unknown in applications of Koopman theory, we say that this is an example of an uncertain system.In Koopman theory, it is ordinarily assumed that the history of input-output observations {(ϕ i , y i )} i∈N0 has been collected, and it is desired to use this information to estimate properties of, or some mathematical object associated with, the flow.For this type of dynamical system, the Koopman operator U f is defined as a mapping that takes functions g : X → R defined on (perhaps some subset of) the state space X to another function U f g via the rule for all x ∈ X.In this definition note that the operator U f is characterized by the unknown function f that appears in the evolution law: it is fixed in the definition above.It is now possible to define another discrete evolution law, one that defines the evolution of functions that satisfy which can be used to investigate various properties of the original system in Equation 1.This evolution law determines a discrete, linear, generally infinite dimensional dynamical system, in contrast to the original system that is discrete, nonlinear, and finite dimensional.In the language of Koopman theory, Equation 3 is said to induce discrete linear dynamics in the "space of observables," or observable functions.We can fill in a bit more details, at least for one iconic problem, to explain how the study of Equation 3 might inform the study of the original system in Equation 1. Suppose we set the number of outputs m = d, define the output function h := (h 1 , . . ., h d ) T , and introduce the vector-valued Koopman operator If we select the observable function to correspond to the full-state observation function h(x) = x = {x 1 , . . ., x d } T , then we have We see from this identity that if we are able to build approximations of the Koopman operator U f , it in principle enables the development of state estimators or predictors for the original nonlinear system.A similar strategy can be followed to study forecasting of the output y n+1 .Since we have y n+1 = h(x n+1 ) = h(f (x n )) = (U f h)(x n ), we see that approximations of the Koopman operator can be used to forecast the future output y n+1 from the present state x n .
This simple model problem illustrates clearly a tradeoff that arises in Koopman theory that largely motivates its use.The dynamics of nonlinear systems can be much more difficult a topic of study than that for linear systems.Koopman theory replaces the study of the original nonlinear system with that of a linear system.Still, as attractive as it is to study a linear system instead of a nonlinear system, there is at least one significant drawback here.The original system above is finite dimensional while the latter one induced by the Koopman operator U f is generally infinite dimensional.In fact, the Koopman operator U f that defines the dynamics in the space of observables depends on the unknown function f , and it therefore is also unknown when the dynamics are uncertain.Considerations of how to build approximations of the Koopman operator in finite dimensional spaces necessarily plays an important role in Koopman theory, and these associated technical questions can be delicate and nuanced.It is the nature of the approximations of the Koopman operator that is the concern of the recent papers summarized above.
Any approximation of the Koopman operator necessarily entails a choice of the finite dimensional spaces of approximants, and the types of approximants that have been studied for this problem is vast.Common choices of bases include Fourier modes, globally supported polynomials, piecewise polynomials in the form of finite element functions or splines, wavelets, bases defined in terms of reproducing kernels of native spaces, as well as eigenfunctions of Koopman operators or approximations of eigenfunctions of Koopman operators.A discussion of the approximation spaces that can be defined in terms of some of these choices above, and how they influence convergence rates, is given by the authors in [38].
However, even in view of the diverse choices summarized above, perhaps it is most common to construct approximations of Koopman operators in terms of eigenfunctions of the Koopman operator, or in terms of approximations of eigenfunctions.The eigenfunctions, which are simply known as Koopman modes, are solutions of the (functional) eigenvalue problem for the Koopman operator.Just as the eigenvectors of the coefficient matrix of a linear, time-invariant system can be used to decouple, diagonalize, and simplify the study of that system, for some types of nonlinear systems a similar strategy in terms of the Koopman modes is possible.In essence, the Koopman modes sometimes can be used to transform the nonlinear system rigorously into a linear system.With this ideal scenario in mind, a common strategy for a nonlinear problem at hand is to generate a reduced order, finite dimensional model by the superposition of a finite number of Koopman modes.When the calculation of the Koopman modes can be problematic either analytically or computationally, alternatives rely on computing approximations of the eigenfunctions.The study to establish for what systems this transformation can be accomplished exactly or approximately with sufficient accuracy, and the assessment of the magnitude of any resultant errors is a very active area of research.The studies [11][12][13]16,[22][23][24]26,27,31,38,39] all investigate aspects of how to go about generating approximations of Koopman modes, which has become a topic of interest in and of itself.

Broader Issues and Open Questions
As suggested in the above discussion, the fine details of the theory advanced in [11-20, 22-33, 40] can vary a lot from reference to reference.Here we identify a few of these recent publications that, in the authors' view, appear most relevant to this paper.

Most Relevant Recent Work
The early work by [31] studies the relationship of Koopman approximations to matrix representations that appear in the dynamic mode decomposition (DMD) and Extended DMD (EDMD) algorithms, and uses the algorithms to estimate the Koopman modes.Reference [39] analyzes approximations of the Koopman and Perron-Frobenius operators, their relationship to the EDMD algorithm, and interprets them as different types of Galerkin approximations.A central conclusion of [39] is that as the number of samples m increases, a finite dimensional approximation of the Koopman operator converges over a fixed finite dimensional subspace of approximants.That is, if n is the dimension of the space of approximants or of the basis, and m is the number of samples, this paper gives sufficient conditions to ensure that when n is fixed, convergence is guaranteed as m → ∞.Examples are studied in which the bases of the approximant spaces are selected to be globally defined monomials.Reference [21] builds upon the study in [39] and further explores that approximation of transfer operators, and in particular the Koopman and Perron-Frobenius operators.The relationships among various approximations, the DMD method, the EDMD methods, the time-lagged independent component analysis (TICA) method, and the variational approach to conformation dynamics (VAC) are discussed.
The recent reference [41] further studies the convergence of approximations of Koopman operators in terms of projections on the L 2 µ (X) space with X the state space of the semiflow.Several conclusions regarding convergence as the dimension n of the space of approximants or the number m of samples approaches infinity are made, but again, no discussion about rates of approximation in terms of fixed numbers m of samples or fixed dimensions n of the approximant space is given.
Other references that are particularly relevant to this study include the quite recent efforts in [12,[42][43][44].These papers explore methods for framing the approximation of Koopman operators in terms of bases built from kernels of RKH spaces.The paper [42] studies the approximation of the Koopman and Perron-Frobenius operators and concentrates on the realization of approximations in terms of kernels of RKH spaces.In contrast to many of the articles on the approximation of Koopman operators, reference [42] discusses probabilistic rates of convergence of various approximations introduced for the regularized, stochastic Koopman operators.Specifically, it is shown in Theorem 3.14 that, for systems in which the samples are independent and identically distributed (IID), approximations converge in probabilistic order O P (m −1/2 ϵ −1 ) with m the number of samples and ϵ the regularization parameter used to define the approximation to the Koopman operator.(It should be noted that this estimate is derived by choosing m = n, much as we do later in our data-driven approximations of the Koopman operator.)This bound is derived for IID processes, which limits its applicability to cases when the samples are generated along the sample path of many nonlinear stochastic recursions.This result should be contrasted with Section 8 of [38] that is based on concentration of measure inequalities for certain types of dependent or independent processes; the latter yields error estimates for approximations of stochastic Koopman operators that are explicit in the number of samples m and the dimension of the space of approximants n.
The approach in [12] uses diffusion eigenfunctions that are intrinsic to the manifold M over which evolutions take place.In particular, the eigenfunctions of the Laplace-Beltrami operator over M are employed as the basis for theoretical considerations.In this sense the method can be understood as an intrinsic method, which we talk about in further detail later in this paper.Since the construction of the intrinsic basis over the manifold is impossible when the manifold is unknown, data-driven approximations of the unknown intrinsic kernel are computed using a variable bandwidth kernel approximation defined over R d .In some sense, the results described in the current paper can be viewed as a method to obtain explicit bounds on the error of approximation in a similar setup.Specifically, in this paper we make precise when we can restrict a kernel defined over the state space R d to a particular limiting subset Ω (which may be a manifold) and derive associated rates of convergence.By employing trace theorems for Sobolev spaces that are regular enough to be RKH spaces, it is possible to make explicit predictions of rates of convergence in terms of the fill distance, a topic not pursued in [12].
The authors of [43] study approximations of Koopman operators for some unitary groups, again constructing approximations in terms of bases for RKH spaces over a smooth manifold M .An important contribution of this paper is the choice of the RKH framework that induces dynamics described by the unitary group e tWτ for a compact, self-adjoint operator W τ that is amenable to approximation.Overall, the analysis in the paper investigates the convergence of spectral approximations of Koopman operators, and it explores the convergence of their associated kernel integral operators over L 2 µ (X).Typical results (see for example, Theorem 21) ensure the convergence of approximate eigenfrequencies, approximate eigenfunctions, and observables as the dimension n of the subspace of approximants increases to infinity.Again, to emphasize the contrast to results in this paper, reference [43] does not study the rate of convergence, nor error in say L 2 µ (M ), of approximations of the Koopman operator that is induced by a particular choice of a kernel of the RKH space.Different choices of kernels will induce different rates of convergence, and it would be a great advantage to understand how to control or influence the error by virtue of selecting a different basis.
References [45] and Sections 8.3 through 8.5 of [38] are noteworthy to the current paper because they emphasize the relevance of a large body of research on statistical learning theory, approximation theory, and nonlinear regression that is pertinent to Koopman theory.As shown in [38], the conventional result whereby the EDMD algorithm is used to generate a data-driven approximation of the Koopman operator can be identified, in some situations, with the estimates that result from the method of empirical risk minimization in statistical learning theory.In fact, this equivalence can be used to derive rates of convergence for EDMD-based approximations of the Koopman operator that improve some of the convergence guarantees that have been derived in recent literature.A similar strategy is explored in [45], although the convergence results stated are not as sharp.Reference [45] casts the problem of approximating the Koopman operator as a classical empirical risk minimization problem, chooses a basis in terms of a reproducing kernel, and subsequently shows in Theorem 1 that approximations converge in the Lebesgue space L 2 µ (X) with X the state space of the system.As outlined in Section 3 of [45], rates of convergence follow in principle by making assumptions about the rate of decay of the eigenvalues that describe the reproducing kernel.See also [38] how this strategy is equivalent to the definition of linear, spectral approximations spaces that characterize the rates of convergence.
The authors in [38], as noted above, discuss rates of convergence for approximations of Koopman operators in terms of spectral approximation spaces.These spaces are defined via the eigenfunctions of integral operators that are defined in terms of the kernel of an RKH space.As suggested above, this is a well-known technique to study the approximation rates of functions.It is widely used when the eigenfunction basis is known, or at least can be readily approximated.However, it cannot be emphasized enough that computing a basis that is intrinsic to some complex underlying set Ω, or even a smooth manifold M , can be extraordinarily difficult.The authors in [38] discuss rates of convergence for some common choices of bases including normalized B-splines, piecewise polynomials, orthonormal wavelets, and orthonormal multiwavelets.But the analysis there is carried out assuming that the bases used for approximations are "well-aligned" with the limiting set Ω = X.The definition of such nice bases may be possible in some highly structured cases, such as flows on a circle, a sphere, or on the torus.But it may be far from trivial to come up with such well-designed, domain-adapted bases in typical applications of Koopman theory since the domain over which samples are collected is unknown, has a complicated structure, or both.

Open Questions
Considered as an entire class, there are some important subproblems in the approximation of Koopman operators that are not covered in a systematic way in the literature so far.We can summarize some of these open questions by being a bit more precise about the structure of the problem that arises when trying to approximate a Koopman operator.The general setup in this paper is described in terms of the following key features that are used to frame the approximation problem.We are given a continuous semiflow in either discrete or continuous time over the state space X.In some situations, the state space may be compact but in many cases it is not.A set of samples Ξ := n∈N Ω n are collected that are dense in a "limiting subset" Ω ⊆ X.Here Ω n is a finite set that contains n samples that is used to build a particular finite dimensional approximation of the Koopman operator.The limiting set Ω may, or may not be, the entire state space X.A finite dimensional space H Ωn of approximants is defined in terms of a basis {ψ i } i=1,...,n , and this basis is used to fashion an approximation of the Koopman operator.The basis may be supported on Ω ⊂ X, or it may be supported on all of X.This means that we can discuss approximation of Koopman operators on functions supported on the entire state space, or we can talk about the approximation of Koopman operators over functions supported on a proper subset Ω ⊂ X.
We can gain an appreciation of some of the open problems that motivate this paper by considering a few of the most common scenarios.Perhaps most frequently, it is simply assumed that the samples Ξ are dense (Ξ ⊂ X and X ⊂ Ξ) in the compact state space X over which the evolution is defined.In this event, we have the special case in which the limiting set Ω = X is known a priori.By definition, the basis functions could be any of a large number of possible choices for functions over Ω = X, including splines, polynomials, Fourier modes, wavelets, or kernel functions.It is frequently the case in the papers above that the basis {ψ i } n i=1 is a collection of eigenfunctions of the Koopman operator, which are then necessarily functions over the set X.It is also commonly the case that the basis functions are in fact other types of eigenfunctions, associated with different differential operators or semigroups, like the heat kernel over X.For any of these choices of bases, in the highly idealized scenarios when samples fill the state space X, it is still seldom the case that the rates of convergence of the approximation of Koopman operators are related to the specific choice of the bases {ψ i } i=1,...,n defined over the set X.It is much more commonly the case that the approximations are shown to converge in terms of L 2 µ (X), for instance, but the rates of convergence are not derived based on the choice of bases {ψ i } i=1,...,n that are contained in L 2 µ (X).This situation for the approximation of Koopman operators should be contrasted to studies of rates of convergence for approximations of solutions of partial differential equations (PDEs).A few decades of study have developed a rich theory that relates approximation theory, the definition of (finite element or finite volume or spline or wavelet or kernel or eigenfunction) bases, and the rates of convergence of approximations of solutions of PDEs.To be sure, some of the standard machinery for building approximations of PDEs by refinements or enrichments over fixed, bounded domains are either unsuitable or would be difficult to apply to approximations in Koopman theory.Even if the state space Ω = X is known a priori, the distribution of samples over X is generally not known, and it is the distribution µ in L 2 µ (X) that intuitively would guide enrichment, refinement, or other forms of basis selection over X.The difficulty of course is that the portions of the domain over which samples are collected or concentrated is inextricably connected to the dynamical system under study.Moreover, sometimes approximations are sought to induce a dynamics that is somehow consistent with the original semiflow, and this topic does not seem to have a precedent in most methods for approximating solutions of PDEs.All of these factors combine to make it more challenging to derive rates of convergence of Koopman approximations.Still, there is a notable lack of results on rates of convergence of Koopman operator approximations, even in this simplest scenario when samples are dense in X.While these results on convergence may be harder to come by, or to prove, there definitely is a need for further study along these lines.
Perhaps more importantly, when the limiting set Ω ̸ = X, but rather it is a proper subset of the state space X, the problem of proving convergence of approximations is still more subtle.From a practical point of view, this is an important issue.The driving motivation behind the use of Koopman theory in many real applications is that it is known, or it is believed, that the high dimensional dynamics on the state space X actually corresponds to a dynamics on a simpler or smaller subset in a particular motion regime or case under study.Perhaps the trajectory at hand actually evolves over a "smaller" invariant set, or perhaps the trajectories over time accumulate near such a set.The goal of Koopman theory in such a case is to provide a principled, data-driven method to build a model having reduced complexity.It would certainly be valuable to characterize the rates of convergence in this situation, but part of the problem is choosing bases and associated spaces of approximants that provide a tractable setting for analysis over the unknown set Ω.This is complicated by the fact that the limiting set Ω may have no apparent or particularly notable structure, or it can have a high degree of regularity when measured in a certain sense.As noted on page 341 of [46], the case when the limiting subset is a proper subset of the state space is a common and difficult case that can pose its own set of difficulties.The limiting set Ω may have zero measure as a subset of X, and choosing function spaces over Ω that are well-defined can be a difficult task in its own right.In any event, when the limiting set Ω is an unknown proper subset of X, the approximation problem is more complex.We can define the basis used for approximations over the limiting subset Ω, or we can define the basis for approximation over X.In both cases we want to understand how the choice of bases affects the convergence rate, what are appropriate function spaces in which to estimate errors, and how the approximation results over Ω or over X might be related.
Before proceeding to a summary of the new results of this paper, we would like to emphasize one final source of complication when putting together a general approach to the approximation of Koopman operators.A data-driven approximation of the Koopman operator will always be a function of the number of samples m of the observations and the dimension n of the space of approximants.For Koopman operators associated with stochastic flows, approximations of the Koopman operator will typically consist of a bias and variance term: this is a fundamental decomposition of the error when samples are generated according to a stochastic process.The bias term can be traced to the error that is induced by the specific choice of the space of approximants, and it is also known as the approximation space error.The variance term is associated with the stochastic nature of the system.A simple overview of the breakdown of the error in approximating the Koopman operator in terms of the bias and variance is given in [38].The approximation space or bias error decreases as the dimension n of the space of approximants is increased.The stochastic error is typically a complicated function of the dimension n and the number of samples m.However, for a fixed number of samples m, the variance error term increases as n → ∞.The important issue to be remembered regarding this decomposition is that the bias-variance tradeoff implies that the dimension of the approximant space n must be adapted as a function of the number of samples m to achieve an optimal error reduction.
This paper considers only deterministic systems, which means that some of these finer issues regarding the balance of probabilistic and deterministic errors do not arise.

Strategy of this Paper
In short, the purpose of this paper is to provide a general and systematic framework for the construction of approximations of Koopman operators for deterministic systems, one that can be applied either when the samples are dense in the entire state space Ω = X, or when samples are dense in a limiting subset Ω ⊂ X that is properly contained in the state space.The goal of the theory is to generate approximations for which the rates of convergence can be studied for a wide class of Koopman operators and bases {ψ i } i=1,...,n .We want to explore how the rates depend on the choice of function spaces, the limiting set Ω, and the choice bases that define the space of approximants.
The analysis in this paper differs in two essential ways from many approaches to the approximation of Koopman operators that have appeared over the past few years.First, the study makes a careful use of priors to establish rates of convergence that depend on the smoothness and approximation properties of the observables on which the Koopman operator acts.Priors in this paper refer to membership in certain native spaces of reproducing kernels, or in some cases to Sobolev spaces that measure a scale of smoothness s.The parameter s can likewise be interpreted as describing the approximation properties of the functions in the Sobolev space.This is a standard framework in which other types of general (nonparametric) estimation problems are cast, such as in the study of (distribution-free) learning theory as it is applied to nonlinear regression [47][48][49][50][51].In this paper we demonstrate the strength of such an approach when it is applied to Koopman theory also.A pragmatic implication of this framework is that we are able to derive strong error bounds in a number of different, but common, scenarios that arise in the approximation of Koopman operators.Overall, we can view the current paper as a continuation of the study and approach in [38].That reference studies rates of convergence for some common choices of basis functions like splines, orthonormal wavelets, and orthogonal eigenfunctions of integral operators that are often associated with the kernels of RKH spaces.The results in [38] apply when the bases are "well-aligned" with the domain Ω = X.Here, in contrast, the emphasis is on bases built from the translates of the kernel functions of a RKH space, and much of the theoretical background is explained by methods of scattered data approximation.We show that the choice of bases in terms of scattered data samples is a powerful tool for the derivations of rates of approximation of Koopman operators in that the conditions that bases be adapted to the domain are not required.
The second distinguishing feature of the approach in this paper is the investigation of the way in which the structure of the underlying set on which the dynamics is supported or accumulates plays a critical role in the analysis of Koopman operator approximations.The authors feel that in fact it is this dependence on the set that supports the underlying dynamics that fundamentally differentiates approximation in Koopman theory from other fields of study in estimation and approximation theory like nonlinear regression.We will see that ultimately the interplay between the dynamics and rate of convergence of approximations of the Koopman operator is described by the rate at which the samples fill the limiting set.To the authors' knowledge, this is a qualitatively new result in the analysis of approximations of Koopman operators.
Consideration of a few examples can illustrate the breadth of behavior that the authors feel that approximation of the Koopman operator should take into account.The examples below are meant to emphasize an important feature about data-driven models of dynamical systems: they are often used to capture the "underlying physics" of some high-dimensional flow.In rough terms this means that although the state space may have many degrees of freedom, there actually is a low dimensional mathematical structure lurking in the larger space.Although, the examples are only two dimensional, they each illustrate how the limiting behavior of the flow can exhibit quite a bit of structure.They are also suggestive of how error bounds for approximation of Koopman operators reflect or depend intrinsically on that underlying structure.While discrete semiflows with the form of Equation 1 are more common in the existing studies about Koopman operators, we will also treat orbital Koopman operators, defined in Section 3.1, that are associated with semiflows in continuous time such as the following two examples.Of course, discretization in time of the continuous time dynamical systems in the following examples could also define discrete time flows of the type in Equations 1.

Example 1 (Example 2.1, [52])
The following system of equations represent the model of a tunnel-diode circuit.Figure 1, (a) depicts trajectories for several initial conditions of the flow, with u = 1.2, R = 1.5 × 10 3 , C = 2 × 10 −12 , L = 5 × 10 −6 , and h(ϕ 1 ) := 17.76ϕ 1 − 103.79ϕ 2  1 − 229.62ϕ 3 1 − 226.31ϕ 4  1 + 83.2ϕ 5 1 .This example has been selected since the phase portrait is highly complex and does not exhibit any recognizable "simple underlying structure" that will feature prominently in the next few examples.Samples Ξ(ϕ 0 ) and Ξ(ψ 0 ) that are collected along two distinct trajectories are shown in the figure.The positive limit sets consist of a finite collection of isolated points in this example, and the qualitative asymptotic behavior of a trajectory can vary drastically depending on the location of the initial condition.We are interested in obtaining precise descriptions of convergence of Koopman operator approximations built from samples such as these two sets, which we intuitively expect to be quite different finite dimensional operators.In addition, we are interested in understanding over what spaces of functions the Koopman approximations converge, how these spaces compare, what convergence results can be stated if the two sets are merged, and so forth.In the current example, the emphasis would be on identifying useful function spaces that are well-defined for the problem of estimating error in this "unstructured" scenario.
This semiflow exhibits behavior that is qualitatively quite different from that in Example 1 above.There is a unique limit cycle for this flow, and for any ϕ 0 on the limit cycle, the orbit Γ + (ϕ 0 ) is a smooth, compact manifold M ⊂ R 2 .However, for other initial conditions ϕ 0 , the orbit Γ + (ϕ 0 ) is not a smooth compact manifold.
It is easy to see from the phase portrait that the limit cycle M attracts the trajectory generated by any initial condition as t → ∞.If we have two different initial conditions ϕ 0 and ψ 0 and generate two different samplings Ξ(ϕ 0 ) and Ξ(ψ 0 ), then the limiting sets Ξ(ϕ 0 ) and Ξ(ψ 0 ) are different sets.It is not hard to imagine that finite dimensional approximations, especially those constructed from samples during the "transient regime," could be substantially different.However, both sets Ξ(ϕ 0 ) and Ξ(ψ 0 ) contain the same positive limit set M .This paper makes precise in what sense the approximations of the Koopman operator generated from Ξ(ϕ 0 ) and Ξ(ψ 0 ) are comparable: they converge in some sense to the same operator restricted to the RKH space H M that is indexed by the manifold M .Furthermore, if the RKH space H M happens to be a Sobolev space, we derive sufficient conditions to guarantee rates of convergence of approximations of the Koopman operator.In this paper we will describe methods in which approximations of the Koopman operator are derived in terms of intrinsic kernels that depend on knowledge of the manifold M and extrinsic kernels that do not require knowledge of the underlying manifold M .

Example 3
In Figure 1, (c), the trajectories of the Lotka-Volterra predator-prey model given by the system are depicted.The phase portrait illustrates the fact that this system exhibits a great deal of structure, in some sense perhaps more than the other examples considered thus far.For infinite number of choices of initial conditions ϕ 0 , the orbit Several samples are shown along one trajectory, where the orange points denoted with a '.' are collected during the first transit around the limit cycle, then the samples denoted by purple squares are added during the second transit around the limit cycle, and so forth.The collection of samples Ξ := Ξ(ϕ 0 ) := {ϕ i } i∈N0 := {ϕ(t i )} i∈N0 can be built progressively as t i → ∞ so that M := Ξ. Basic questions arise that can be traced to the nature of the set Ω.For instance, if we build an approximation U n ϕ of the Koopman operator U ϕ in terms of the sampling Ξ(ϕ 0 ), for what functions g is U n ϕ g a good approximation of U ϕ g?We can state this another way: over what subspace of functions is U n ϕ a good approximation to U ϕ ?This paper will show that since M is a smooth manifold, we can get precise answers to this question for some approximations of the Koopman operator.
Example 4 Here we briefly overview the geometric method of numerical integration [53] that is often used to generate simulations that preserve certain qualitative behaviors of systems.We assume the discrete evolution used for numerical integration is an approximation of the underlying continuous one.Given state variables ϕ ∈ M , where M is a d-dimensional differentiable manifold, the tangent space to M at a point ϕ ∈ M is denoted by T ϕ M .For a curve t → ϕ(t) on the manifold M , the velocity φ(t) ∈ T ϕ(t) M at a time t is tangent to the manifold M .The tangent bundle T M is the union of all of the tangent spaces to M , which serves as (b) A flow having "asymptotic structure," where trajectories accumulate on the positive limit set, which is an invariant set of the flow.(d) Samples taken from a geometric numerical integrator scheme.The integration scheme determines discrete flows of orbits defined by 1-dimensional manifolds.The underlying manifolds are completely determined by the initial conditions.For certain orbits that do not lie on the same manifold, the estimates will never convergence to one another as t → ∞. the state space of (ϕ(t), φ(t)).In analogy to the ODEs in Examples 1, 2 and 3, continuous evolution law on the manifold M is governed by the equation where the map F : M → T M .It defines a semigroup {S(t)} t≥0 on the manifold M so that the solutions are given by ϕ(t) = S(t)ϕ 0 .In this case, we can discretize the semifow and define the familiar discrete nonlinear recursion, ϕ i+1 = f (ϕ i ) where f (ϕ i ) = S(h)ϕ i with a fixed timestep h.The theoretical results of this paper, for instance, can be applied when the system eventually approaches a limit set Ω which is itself a manifold.In practice, a closed form expression for the semigroup S(t) is not available for many systems of interest.Geometric integration methods [53] are popular methods for approximating the system on the manifold M by a discrete evolution law.For example, we consider the evolution corresponding to the dynamics of a pendulum with a mass m = 1 hanging from a massless rod of length l = 1 that is approximated using the Störmer-Verlet discrete integration scheme as given in Example 1.4 in [53].In this example, ϕ 1 and ϕ 2 correspond to the pendulum's momentum and position, respectively.The equations of motion are given as The discrete evolution at the i th step is determined from the pendulum's position ϕ 2,i , as well as approximations ϕ 1,i and ϕ 1,i+1/2 of the derivative ϕ 1 = φ(2) at the i th and i + 1/2 step.The approximations are calculated by introducing the difference equations Note that in this formulation φ2 = f 1 (ϕ 2 ), which can be approximated by a second-order difference quotient Using the positions ϕ 2,i and the momentum ϕ 1,i along with the second derivative f 1 (ϕ 1,i ) and Equations 4, 5 and 6, we can determine a one-step evolution f : X → X to get {ϕ 1,i+1 , ϕ 2,i+1 } defined by the following recursion: Like the previous example, for any ϕ 0 , the orbit In Figure 1, (d), several samples Ξ(ϕ 0 ) denoted by the orange circles are collected along the discrete flow determined by the geometric numerical integration scheme of Equations 7 through 9 starting from initial condition ϕ 0 .These samples actually come from a limit cycle defined by a 1-dimensional manifold M ϕ0 which contains ϕ 0 .Similarly, samples Ξ(ψ 0 ) denoted by purple squares are collected around a separate limit cycle defined by the manifold M ψ0 where the integration starts at different initial conditions ψ 0 .Samples Ξ(ϕ 0 ) and Ξ(ψ 0 ) will never intersect as they are confined to the manifolds M ϕ0 and M ϕ0 respectively as t → ∞.Again, we are interested in understanding in what spaces approximations of the Koopman operator converge for such systems.

A Review of the New Results
The systems for which the theory in this paper applies include some types of discrete or continuous semidynamical systems that evolve on a complete metric space (X, d X ).Most of the examples choose X as Euclidean space X := R d or as a compact Riemannian manifold M .Some examples are also described when the system evolves on or is attracted to certain types of k-dimensional smooth manifolds M that are regularly embedded in Euclidean space, M ⊂ X := R d .These latter cases are important in applications, such as illustrated in the examples, since it is not uncommon that samples Ξ := Ξ(ϕ 0 ) are generated that fill sets that have zero measure as subsets of the state space X := R d .
Similar error bounds are derived in this paper for the two types of deterministic dynamical systems, both of which are very general.An evolution law like Equation 1 is the exemplar for case with discrete dynamics determined by a mapping f .For such systems it is well-known that the Koopman operator U f is given as shown in Equation 2 with (U f g)(x) = (g • f )(x) for each x ∈ X.We also define a type of Koopman operator for continuous semidynamical systems in continuous time.If t → ϕ(t) ∈ X is a trajectory of the system through the initial condition ϕ 0 ∈ X, we define the orbital Koopman operator U from the identity (U ϕ g) The set of all samples collected along the flow is denoted by Ξ := n∈N Ω n with Ω n := {ξ i | 1 ≤ i ≤ n} the finite set of samples used to construct a particular approximation of the Koopman operator.The analysis of approximations of the Koopman operators is carried out by identifying a limiting subset Ω ⊆ X in which the samples Ξ are dense.The study of approximations in the paper progresses from simple, rather unstructured cases to those that are more structured and can be considered as direct implications of well-known results on scattered data interpolation [57].The study begins with the definition of a reproducing kernel K : X × X → R that induces the RKH space H X := span{K ξi | ξ i ∈ X} of functions over the state space, which is the largest set of functions used to formalize the approximation problem.In this definition K ξi := K(ξ i , •) is the kernel function centered at ξ i ∈ X. See Section 2.2 for the properties of such spaces and associated definitions.Finite dimensional spaces of approximants H Ωn are defined as finite spans of collections of such bases centered over scattered data as in H Ωn := span{K ξi | ξ i ∈ Ω n }, and the convergence of the approximate Koopman operators is cast in terms of the closed subspace H Ω := span{K ξi | ξ i ∈ Ω} that is generated or indexed in terms of the limiting set Ω ⊆ X.

Projection Approximation U n f Without Regularity Assumptions on Ω
The first and simplest approximation U n f of the Koopman operator U f in this paper is defined in terms of the H X -orthogonal projection P Ωn : H X → H Ωn , and we set U n f g := (P Ωn g) • f .Theorems 1 and 2 together imply that we have the error bound Roughly speaking, this convergence result is similar in nature to that in Theorem 3 of [41], which shows that as the dimension of the space of approximants goes to infinity, the approximate Koopman operator U n f approaches the true Koopman operator U f strongly in the strong operator topology over L 2 µ (X).The equation above is entirely analogous and implies that convergence is guaranteed in the strong operator topology on maps from H Ω → C(X).Neither the result in [41], nor the analysis in Theorems 1 and 2 of this paper, describe the rates of convergence for different choices of spaces of approximants, however.They only imply convergence as the dimension of the space of approximants approaches infinity.A quick inspection of this pair of theorems shows that Theorem 1 implies the leftmost inequality above, while Theorem 2 implies that ∥(I − P Ωn )g∥ H X → 0 for g ∈ H Ω .This latter limit is shown to hold if the kernel K that induces the native space H X has strong separation properties.These properties are known to hold for a large collection of kernels such as the Matern-Sobolev kernels, the Abel kernel, or the ℓ 1 -exponential kernel.However, the Gaussian kernels are known to lack the separation properties assumed in Theorem 2. [58] Thus, while very popular, the Gaussian kernels are not guaranteed by our analysis to generate such convergent approximations: in their case, more or other types of analysis are needed.

Projection Approximations U n f , Ω a Regular set or Manifold
In this paper, the rather weak results in Theorems 1 and 2 for "unstructured" limit sets Ω ⊆ X are improved when either the limiting set Ω ⊂ X is a compact set having a Lipschitz boundary, or when Ω := X := M is in fact a smooth Riemannian manifold.Theorem 11 gives sufficient conditions that the projection-based Koopman operator U n f (•) := (P Ωn (•)) • f satisfies a bound such as for all g ∈ W t,2 (M ) when the limiting set Ω is in fact a smooth, connected, compact, Riemannian manifold Ω := M .In this equation the error is measured in the pullback space f * (W s,2 (M )) that is defined in Section 2.2, the parameter h Ωn,M is the fill distance of the finite samples in Ω n in the manifold M , and the ranges for the indices t, s are dictated by the Sobolev embedding theorem and the many zeros theorem on manifolds (see Appendix).When the samples Ξ are dense in a limiting set Ω ⊂ X that is sufficiently regular, in the sense that it is compact and has a Lipschitz boundary, sufficient conditions for the similar bound for all g ∈ W t,2 (Ω).Pointwise error bounds that are qualitatively similar to the last two Equations 10 and 11, ones that also bound the pointwise difference | in terms of a power of the fill distance, are given in Theorems 4, 5, and 6.

Data-Driven Approximations U n f over M a Smooth Manifold
The approximation U n f (•) := (P Ωn (•)) • f studied above in Equations 10 and 11 is for the projection-based operator P Ωn : H M → H Ωn , but this expression cannot be evaluated unless the function f is known.Bounds on the error induced by the projection-based approximation U n f are certainly valuable to understand the "worstcase" performance of approximations built from a given finite dimensional space of approximants H Ωn , and they are also important in their role in studying data-dependent approximations U n f g := P Ωn ((P Ωn g) • f ).We discuss these data-dependent approximations next.As shown in Section 4, the operators U n f can be constructed from the input-output samples {(ϕ i , y i )} 1≤i≤n = {(ϕ i , f (ϕ i )} 1≤i≤n along the discrete trajectory of the system in Equation 1.It is also worth noting that the realization of the coordinate representation of U n f is closely related to the approximation of the Koopman operator that is defined in terms of the EDMD algorithm, provided that the number of samples is equal to the dimension of the space of approximants.This fact is explored in detail in Example 6.The definition of U n f makes sense only so long as (P Ωn g) • f ⊆ H M .Thus, a standing assumption in this case is that the pullback space f * (H M ) ⊆ H M .Since (P Ωn g) • f ∈ f * (H M ), this structural assumption is enough to ensure that the data-driven operator U n f is well-defined.Theorem 11 is representative of the type of bound that can derived in this case.We have a pointwise error bound in terms of the fill distance of the samples Ω n in the manifold M .
1.5.4Data-Driven Approximations U n P Ωn f Over a Regular Subset Ω ⊆ X In the last section it is noted that the data-driven operator U n f is well-defined if we assume that f * (H M ) ⊆ H M .But it is not immediately clear that this assumption is warranted, nor does it appear easy to verify in many cases.Alternatively, we can make a direct structural assumption on the function f that appears in the discrete evolution Equation 1.If we assume that f ∈ (W t,2 (Ω)) d , we show following Theorem 8 that we have the pointwise bound which again depends on the fill distance of the samples Ω n in the limiting set Ω.In contrast to the analysis of the data-dependent operator U n f , however, the data-dependent operator U n P Ωn f requires a higher order embedding of the underlying RKH space in space of Lipschitz continuous functions.This is achieved in practice by appealing to the embedding of the Sobolev spaces in the Lipschitz spaces, a well-studied topic in the classical analysis of these spaces.

Koopman Approximations over a Smooth Manifold M Embedded in R n
The motivation behind considering the Koopman approximation over a manifold is that the samples generated by the dynamical system cover a smooth manifold in certain practical cases.However, it is essential to note that the manifold's explicit formulation (equations) is unknown in most cases.This, in turn, implies that determining the Riemannian metric on the manifold M can be challenging.Theoretically, we can derive such a metric by defining a kernel K on the manifold M .Then we can apply the results discussed in the previous sections to the problem at hand.However, this approach can be complex, even in the most straightforward practical cases.Fortunately, we can evoke the structure of the Euclidean space in which the manifold M is embedded to make the computation easier.Defining reproducing kernels over Euclidean space is a well-studied topic.We can now restrict the kernel to the manifold to obtain the convergence rates over the manifold M .The trace theorem, discussed in Section A. 3 and [59], explicitly defines the Sobolev space in which the native RKH space induced by the restricted kernel is contained.It is now possible to understand the convergence rates in the restricted Sobolev space.The kernel restriction to the manifold M results in a loss of smoothness of the Sobolev space, leading to weaker convergence rates.However, this extrinsic approach to approximating the Koopman operator does not require an explicit definition of the kernel over the manifold M , thus making it easier to implement than the intrinsic approach detailed above.
In the remainder of this paper, we give the details of the analysis that establishes and supports the new results summarized above.The presentation of the details includes many auxiliary results that may be of independent interest to the reader, and the authors emphasize these points whenever they appear.

Notation and Symbols
In this section we begin a description of the underlying theory, and make clear common notation and symbols.In the paper, the symbols R, R + , N, N 0 := N ∪ {0} denote the real numbers, nonnegative real numbers, positive integers, and nonnegative integers, respectively.The functions ⌈•⌉ and ⌊•⌋ are the ceiling and floor functions that return the smallest integer greater than, or the largest integer less than, a given real number.When we write a ≲ b, it means that there is a positive constant c that does not depend on a, b such that a ≤ c • b.A similar definition holds for the relation '≳', and we write a ≈ b when a ≳ b and a ≲ b.We use X to denote the state space of a system, which is always assumed to be a complete metric space (X, d X ).In the paper the most general system is taken to be a semidynamical system defined in terms of a continuous semigroup {S(t)} t∈T with the time-indexing set T = R + or T = N 0 .By L p (Ω) := L p µ (Ω) we refer to the usual Banach space of p-integrable functions over Ω ⊆ X with respect to the measure µ on X for 1 ≤ p < ∞, with the usual modification for p = ∞.The space C(Ω) is the collection of bounded continuous functions f : Ω → R on Ω ⊂ X equipped with the usual supremum norm, and C(Ω, R d ) is the associated space of vector-valued, continuous, and bounded functions f : Ω → R d .We denote by C 0,1 (Ω) the space of Lipschitz function over a closed subset Ω, where the norm ∥f ∥ C 0,1 (Ω) = ∥f ∥ C(Ω) + |f | C 0,1 (Ω) is defined in Equation 25 in Section A.3.As discussed more carefully in Section 2.2 below, H X will always denote an RKH space of real-valued functions defined over the state space X that is induced by an admissible kernel K : X × X → R. When (U, ∥ • ∥ U ) and (V, ∥ • ∥ V ) are normed vector spaces, we say that U is continuously embedded in V whenever U ⊆ V , and the canonical injection i : u ∈ U → i(u) = u ∈ V is a bounded operator.That is, ∥u∥ V := ∥i(u)∥ V ≲ ∥u∥ U for all u ∈ U .We abbreviate the property by writing U ֒→ V , or U i ֒→ V if we want to be precise about which canonical injection i is involved in the embedding.

Structures in RKH Spaces
As noted in the introduction, in this paper it is frequently the case that H X is an RKH space of real-valued functions over the state space X := R d .However, we also study some examples where the entire state space X is in fact a smooth manifold X := M , and we also consider the RKH spaces H M when M is a submanifold that is regularly embedded in the state space X := R d .The theory of RKH spaces in this section is stated with X an arbitrary set, keeping in mind these intended applications.This section reviews the basic definitions and properties of the RKH space, as well as several auxiliary spaces that are commonly associated with such an RKH space H X .These spaces include the pullback space ϕ * (H X ) of time-dependent functions defined in terms of a mapping ϕ : T → X with respect to an arbitrary time set T , the pullback space f * (H X ) of spatial functions defined in terms of a mapping f : X → X, the space of restrictions R Ω (H X ) of functions in H X to a domain Ω ⊂ X, and the subspace H Ω ⊆ H X defined in terms of a generating set Ω ⊆ X.We also discuss and relate interpolation and projection operators defined on these spaces.The following discussion of the supporting theory of RKH spaces is necessarily concise, and the reader is referred to comprehensive treatments in [60][61][62] for a detailed account.

RKH Space H X and H Ω ⊆ H X of Functions over X
The set X is the state space of the uncertain evolution law in this paper, and all the function spaces introduced in the paper are derived from a real RKH space H X of functions over X.A real-valued function K : X × X → R is an admissible kernel function if it is symmetric, continuous, and of positive type.The kernel is of positive type if for any finite collection of points {ξ i } 1≤i≤n ⊆ X, the Grammian or collocation matrix K n := [K(ξ i , ξ j )] is positive semi-definite.The kernel is of strictly positive type if the Grammian matrix is positive definite for any finite collection of distinct points.The real Hilbert space H X is defined in terms an admissible kernel [60] function with K x (•) := K(x, •) referred to as the kernel basis function located at x ∈ X.The closure is taken with respect to the candidate inner product (•, •) H X that is defined for any two functions K x , K y by (K x , K y ) H X := K(x, y) for all x, y ∈ X.The RKH space H X is said to be the native space induced by the kernel K.An RKH space gets its name from the fact that (f, K x ) H X = f (x) for all f ∈ H X and x ∈ X, which is known as the reproducing property of the RKH space.Not every Hilbert space H is an RKH space: it is known for instance that the Hilbert space of Lebesgue square integrable functions L 2 (X) is not an RKH space.However, it is also known that a Hilbert space H is an RKH space if and only if all the evaluation functionals E x : f → f (x) are bounded operators from H to R. A sufficient condition for this is that there is a constant k such that sup x∈X K(x, x) ≤ k2 .
Then we have the uniform norm bound ∥K x ∥ ≤ k and the uniform operator norm bound |E x | ≤ k.Since K that induces H X is assumed to be continuous, the existence of such an upper bound k is sufficient to ensure that It is evident from the above definitions that it is always possible to define a closed subspace H Ω ⊆ H X that is generated by an arbitrary set Ω ⊆ X as We refer to the set Ω in the definition of the above space H Ω as the generating set of the space H Ω .Note carefully that H Ω are functions defined over all of X, not functions only defined just on Ω.We consider the restriction of functions below in discussions of the RKH spaces R Ω (H X ).In the definition of H Ω , the closure in taken with respect to the norm ∥ • ∥ H X , and H Ω is endowed with the usual topology it inherits as a closed subspace of H X .It is a standard result in the theory of RKH spaces that the space H Ω is an RKH space itself, and its kernel K Ω can be written in terms of the H X -orthogonal projection P Ω of H X onto H Ω and the kernel K as K Ω (x, y) := (P Ω K x , P Ω K y ) H X (14) for all x, y ∈ X.If we set then it can be shown directly from the definition of H Ω that we have the H X -orthogonal decomposition The orthogonal complement of H Ω is thereby the set of functions in H X that vanish identically on Ω ⊆ X.In the above discussion, the definition of P Ω is given for scalar-valued functions in H X .We overload this notation for vector-valued functions and define When we have defined a kernel K on X that induces a native space H X , there is a standard way to define an RKH space R Ω (H X ) of restrictions of functions in H X to a subset Ω ⊆ X.This set is given by As discussed in detail in [60,61], the kernel r : for all x, y ∈ Ω.For any h ∈ R Ω (H X ), this definition of the kernel induces a norm that is equivalent to Note that, by virtue of the above definition, it is immediate that ∥R Ω g∥ R Ω (H X ) ≤ ∥g∥ H X for all g ∈ H X , and the operator R Ω : The spaces H Ω and R Ω (H X ) are closely related.We can identify the space Z Ω that appears in in the orthogonal decomposition H X := H Ω ⊕ Z Ω as the kernel of the restriction operator R Ω : H X → R Ω (H X ), that is, Z Ω = ker(R Ω ).The restriction map R Ω is linear and bounded on H X .Moreover, exists and is a bounded map from R Ω (H X ) → H Ω .We define the extension operator E X : R Ω (H X ) → H X by the equation The extension operator enables an explicit equivalent expression for the inner product on R Ω (H X ) as and in fact it follows from this definition that the operator norm ∥E X ∥ ≤ 1 too.With this definition of the inner product on R Ω (H X ), it is possible to show that the orthogonal projection P Ω satisfies It is also pointed out in some of the proofs that follow that the identity above implies some useful norm equivalences.For any g ∈ H Ω ⊆ H X we have A function g ∈ H Ω , which is defined on all of X, is the zero function in H Ω if and only if the restriction R Ω g is zero over Ω.As for the projection operator P Ω , we define the action of E X and R Ω on vector-valued functions as

2.2.3
The Pullback RKH Spaces γ * (H X ) for γ : S → X When H X is an RKH space, there is a standard construction of pullback spaces that are defined in terms of H X and a mapping into X, which we describe next.For the RKH space H X , any set S, and any mapping γ : S → X, the pullback RKH space γ * (H X ) is defined to be the set of functions which is the space of compositions of functions in H X with the fixed map γ.The norm on the pullback space γ * (H X ) is defined to be As discussed in [62], the pullback space γ * (H X ) is itself an RKH space with kernel K γ defined as for all τ, s ∈ S.This implies that γ * (H In this paper we interpret the trajectory t → ϕ(t) of the uncertain dynamical system as defining the mapping ϕ : T → X with T := [0, ∞), and the specific pullback space ϕ * (H X ) in this paper is subsequently a space of functions of time.The pullback space ϕ * (H X ) is therefore used to study semiflows in continuous time.We also have occasion to use the pullback space f * (H X ) that is the pullback space generated by the mapping f : X → X that appears in the discrete evolution law in Equation 1.The definition of either of the spaces f * (H X ) or ϕ * (H X ), their kernels, their norms, etc, are defined exactly as in the case of γ * (H X ), but replacing the mapping γ with either f or ϕ, respectively.Finally, we note that since R Ω (H X ) and H Ω are RKH spaces in their own right, it also makes sense to define the pullback spaces such as ϕ * (R Ω (H X )) or f * (H Ω ).These spaces are particularly useful when the samples Ξ of the underlying flow are not dense in the state space X, but are dense in a limiting set that is a proper subset Ω ⊂ X.

Example 5
The last few sections have reviewed some of the general properties of RKH spaces, and in this section we summarize a few of the common kernels that define RKH spaces in some of the examples.There are a large variety of admissible kernels K : X × X → R that induce native spaces H X .See [57] for a good overview that focuses for the most part on kernels defined over subsets Ω × Ω ⊆ R d × R d .A very brief discussion of certain kernels defined on smooth manifolds M can be found in Chapter 17 of the same reference [57], but most of the guarantees of convergence in the paper for are cast in terms of kernels described in the family of recent papers [63][64][65].We should also point out that the kernels described below and used in this paper are all examples of positive definite kernels.As carefully discussed in [57], there is a larger family of conditionally positive definite kernels, but we will not consider this level of generality when building the native spaces in this paper.
Exponential Kernels: Of all of the kernels that appear in applications to dynamical systems, it seems as if the most popular are the exponential kernels.Many kernels can be defined in terms of the exponential function, and we refer in this paper to a few that have the form with α > 0 a real constant and 1 ≤ p, q ≤ ∞.The most well-known of these kernels are the Gaussian kernels K α,2,2 , but we use the Abel kernel K α,1,2 and the ℓ 1 -exponential kernel K α,1,1 too because of there excellent separation properties.See [58], Section 3 for an excellent commentary on the separation properties of some of these exponential kernels.The exponential kernel is popular since it has a simple closed form representation, it is smooth, and it is a positive definite kernel for any α > 0. This kernel also appears in many papers that study statistical or machine learning theory, or Gaussian process methods, in terms of RKH spaces.See [66] for a clear and concise discussion of the interrelationships among these fields.Those familiar with Bayesian estimation will recognize that native spaces defined in terms of exponential functions are used to characterize covariance operators.

Inverse Multiquadrics
The kernels K β (x, y) := 1/(c 2 + ∥x − y∥ 2 R d ) β are known as the family of inverse multiquadrics, and they have much in common with the exponential kernels.For any β > 0 these functions are also smooth, have a simple closed form representation, and are positive definite on R d × R d for any β > 0. In principle, the use of the exponential or inverse multiquadric kernels can be used to define interpolation operators that have convergence rates of arbitrarily high polynomial order, even exponential rates of convergence, as discussed in Section 11.4 of [57].Unfortunately, the condition number of the Grammian matrices for exponential kernels can be prohibitively high, which induces numerical stability issues that can make implementations problematic.
Compactly Supported Polynomial RBF of Minimal Degree: We will see that interpolation operators generated in terms of kernel functions of a RKH space are expressed in terms of the inverse of the Grammian matrix K(Ω n ) := [K(ξ i , ξ j )] ξi,ξj ∈Ωn associated with a finite dimensional number of samples Ω n := {ξ 1 , . . ., ξ n }.For kernels that are globally supported like the exponential or inverse multiquadric functions, the Grammian matrix is fully populated.It can be much more efficient to use kernels that are only locally supported, which can imply that many entries of the Grammian (and its inverse) are zero.This leads to much more efficient representations of approximations of functions.In a fashion that is analogous or reminiscent of techniques from finite element methods [67], there are standard constructions of kernel functions K d,k on R d ×R d that are piecewise polynomials, have compact support, and are positive definite for any d ≥ 1 and even integers k > 0. The definition of these kernels varies as a function of the dimension d and even integer index k > 0, so they are not as simple in form as either exponential or inverse multiquadric functions.See [57], Table 9.1 for a summary of some of the more common, low order versions of these functions.
Intrinsic Matern-Sobolev Kernels: So far, the positive definite kernels have been defined over R d × R d , which means that their restrictions to any subset Ω ×Ω defines a positive definite kernel on Ω ×Ω.In this paper we also make use of Matern-Sobolev kernels that are defined on certain types of smooth Riemannian manifolds M .The definitions of these kernels require a bit more information about smooth manifolds.When M is a d−dimensional, closed, compact, connected, smooth Riemannian manifold, we denote by W s,2 (M ) the Sobolev space having real smoothness index s > 0. The careful definition and basic properties of these spaces are described in Seection A that follows.If s > d/2, these are examples of RKH spaces.In the case that the smoothness index is an integer s := m, the reproducing kernel K m of the space W m,2 (M ) is the fundamental solution of the operator equation Lu := m j=0 (∇ j ) * ∇ j u = δ of order 2m with δ the Dirac distribution, L the Laplace-Beltrami operator on M , and ∇ the covariant derivative operator on M .In some classical choices of the compact manifold M , say, if M is the circle S 1 , the sphere S 2 , or the torus T d , closed form expressions for the Matern-Sobolev kernel are known, see [63][64][65].Spherical harmonics on the sphere are eigenfunctions of the Laplace-Beltrami operator L on S d−1 , [57], for instance.The Sobolev-Matern kernels are strictly positive definite on M .
Extrinsic Matern-Sobolev Kernels: In applications to the approximation of Koopman operator, there is an underlying assumption that the dynamics is unknown, or at least uncertain.While the Sobolev-Matern kernels are attractive in that they are an example of a positive definite kernel over a compact manifold, if the exact form of the manifold M is not known, the closed form expression for these kernels will not be readily available.However, the Sobolev-Matern kernels can also be defined for the Sobolev spaces W m,2 (R d ) when m > d/2, and closed form expressions for these these RKH spaces are known.In the case that M is a k−dimensional, closed, compact, connected, smooth, regularly embedded submanifold M ⊂ X := R d , we can define kernels for W r,2 (M ) by the restriction of the kernels defined on X := R d to the set M ⊂ X, with certain restrictions on the index r.Even though intrinsic kernels on M are unavailable or intractable to compute, this strategy can be used to define a kernel on M that is easy to implement in calculations even if a closed form expression for M is unavailable.

Interpolation and Projection
Approximations of the Koopman operators are constructed in terms of finite dimensional subspaces of the spaces H X , H Ω , or R Ω (H X ), depending on the particular application.Associated with the finite set of points we let H Ωn be the n-dimensional space of approximants H Ωn := span{K ξi | ξ i ∈ Ω n }.We define the H Xorthogonal projection P Ωn : H X → H Ωn in the usual way, in terms of the identity ((I − P Ωn )f, g) H X = 0 for all g ∈ H Ωn .We define the interpolation operator I Ωn : H X → H Ωn to be the unique operator that satisfies the interpolation conditions over Ω n for each f ∈ H X , for all ξ i ∈ Ω n .While it is not true for functions in a general Hilbert space, for the RKH space H X the interpolation and projection operators coincide, I Ωn = P Ωn .
In the same fashion, we define the R Ω (H X )-orthogonal projections P Ωn and I Ωn with respect to the ndimensional approximant spaces R Ω (H Ωn ) that are given by R Ω (H Ωn ) := span{r ξi | ξ i ∈ Ω n } where as discussed above r ξ := R Ω K ξ for ξ ∈ Ω.The projections P Ωn and P Ωn are related by the identities These identities can be inferred by manipulation of coordinate expressions, as is carried out in [68].Alternatively, we know that for each g ∈ R Ω (H X ) the interpolation operator I Ωn satisfies (I Ωn g)(ξ) = g(ξ i ) for all ξ i ∈ Ω n .
Because Ω n ⊆ Ω, we have for each ξ i ∈ Ω n .Since the range of E X I Ωn and P Ωn E X are n-dimensional, these relations and the positive definiteness of the kernel suffice to prove that E X I Ωn = E X P Ωn = P Ωn E X = I Ωn E X .We note that identities such as the above have been used elsewhere in the study of approximations in RKH spaces, see [69].
The operators P Ωn , P Ωn above are defined for scalar-valued functions.With an abuse of notation we will overload these definitions so that they apply component-wise on vector-valued functions.We set

Sobolev Spaces
The theory of RKH spaces outlined in the last section is general and allows for functions defined on an arbitrary set X.The strongest convergence results derived in this paper are derived when the RKH space H X , or one of its associated spaces such as H Ω or R Ω (H X ), can be interpreted as a Sobolev space.The Sobolev spaces are some of the most common spaces used to describe smoothness and approximation properties of functions.[70] In this paper we will use Sobolev spaces W r,2 (Ω) for real r > 0 that contain functions defined over a subset of Ω ⊆ X := R d , as well as the spaces W r,2 (M ) that contain functions defined over certain types of manifolds M ⊆ X.The definitions and theorems for these spaces are very similar in form, but differ in the fine details of certain hypotheses.A more detailed discussion on Sobolev spaces is given in Appendix A.

Dynamical Systems and Koopman Theory
As noted in the introduction, Koopman theory has been developed for deterministic or stochastic systems, and for both of these classes in discrete or continuous time.Variants of the theory have most often been studied when the unknown underlying system takes values in the Euclidean space R d , but some work has also been carried out for evolutions that take values in certain types of manifolds M .While we certainly cannot treat all the diverse and nuanced differences in these cases, it is a primary goal of this paper to derive error bounds in a framework which, hopefully, can be extended to many other scenarios.We limit the study here to two types of deterministic dynamic systems, one class in continuous time, and a second in discrete time.We find in the study that the derivation of the rate of convergence of the approximations of the Koopman operators for the two classes have much in common.Also, as discussed more fully in [38], the error analysis of stochastic Koopman operators can be carried out by decomposing the total error into an approximation space error that is known as the bias term, and an stochastic error that is referred to as the variance term.This is a classical decomposition of the error for function estimates that are built from stochastically generated samples.This paper, then, can be viewed as a study of the approximation space or bias error term: it can inform the analysis of part of the total error for stochastic Koopman operators.The full study of the stochastic case would far exceed the scope of this paper, and we leave the topic for future study.

Deterministic Evolutions in Continuous Time
We first discuss deterministic evolutions in continuous time, and we denote the time indexing set T := R + .We assume that the underlying unknown dynamics constitute a continuous semidynamical system that we take to be defined in terms of continuous semigroup of operators {S(t)} t∈T that act on the complete metric space (X, d X ).In most of the examples we choose X := R d , although we do consider cases in which the evolution takes place in an embedded submanifold M ⊆ X.As per the usual definitions [71,72], the positive or forward orbit Γ + (ϕ 0 ) through the initial condition ϕ 0 ∈ X is with t → ϕ(t) := S(t)ϕ 0 the motion or trajectory through ϕ 0 at t = 0.It will be useful to distinguish the positive limit set ω + (ϕ 0 ) associated with an orbit Γ + (ϕ 0 ): The positive limit set will be important to our considerations in that it will make precise the largest set for which some of the error bounds for the data-dependent approximations make sense.Note that if the forward orbit Γ + (ϕ 0 ) is precompact, then the positive limit set Γ + (ϕ 0 ) is nonempty, connected, compact, and attracts the flow.The positive limit set in this case provides a well-defined starting point from which to extract sequences to generate convergent approximations of Koopman operators in this paper.
Associated with the state space X, a real-valued RKH space H X of functions over X is defined in terms of an admissible kernel K : X ×X → R. We define the orbital Koopman operator U ϕ for a trajectory t → ϕ(t) := S(t)ϕ 0 of the system via the familiar relation and in this way the orbital Koopman operator is a map U ϕ : H X → ϕ * (H X ), with ϕ * (H X ) the pullback space generated by the map ϕ : T → X and T := [0, ∞).In fact U ϕ is a bounded and linear mapping between H X and ϕ * (H X ).Boundedness follows since which shows that ∥U ϕ ∥ ≤ 1.
The definition of the orbital Koopman operator U ϕ in Equation 15 is given for g ∈ H X , but we also study these operators for g ∈ R Ω (H X ).Since in this case g : Ω → R, here the Koopman operator U ϕ g := g • ϕ makes sense when the orbit Γ + (ϕ 0 ) ⊂ Ω.This case will be studied used when the samples Ξ do not fill the state space X, but are dense in the limiting set Ω ⊂ X. Carefully note that the identity U ϕ g := g • ϕ actually defines two different operators for the cases when g ∈ H X or g ∈ R Ω (H X ) since they have different domains.We will abuse notation and refer to both as U ϕ .Since, following the same logic as in Equation 16above, we have ∥U ϕ g∥ ϕ * (R Ω (H X )) ≤ ∥g∥ R Ω (H X ) , it follows that the operator norm ∥U ϕ ∥ ≤ 1 in either definition of U ϕ .
Those familiar with constructions in RKH spaces will recognize the above definition of the Koopman operator U ϕ as that of the composition operator encountered in RKH theory.[62] On the other hand, those familiar with Koopman theory will note that this definition differs somewhat from that given for continuous time systems.For continuous evolutions, it is more common to define a different Koopman operator, one that is time-dependent.The time-dependent Koopman operator is defined via the identity (U t f )(x) := f (S(t)x), and then we have U t : H X → H X .Carefully observe that the domain and range of the operator U t is the same space, while they are different spaces entirely for the orbital Koopman operator.It is this difference that motivates referring to U ϕ as the orbital Koopman operator since it depends on the entire orbit, and is not an operator defined for fixed t.We do not study the approximation of operator U t in this paper.Historically, the authors' interest in bounds for approximations of the orbital Koopman operator U ϕ has arisen in the study the RKH embedding method for uncertain systems in [56,68].

Deterministic Evolutions in Discrete Time
Evolutions in discrete time in this paper are governed by the nonlinear, deterministic recursion where ϕ n ∈ X, (X, d) is a complete metric space, and f : X → X is a generally nonlinear, but unknown, function.To keep the exposition and notation simple, we use an overloaded notation to describe this system in a way that is similar to the continuous case.For the discrete system we set the time index set to be the nonnegative integers T = N + 0 and let the discrete semigroup be defined as {S(t)} t∈T := {S(t)} t∈N0 := {f t } t∈N0 .The positive orbit and positive limit set are correspondingly defined as This overloaded notation will enable the discussion of the error bounds in the discrete and continuous time cases have a common structure.
The Koopman operator for the discrete semiflow is defined by the operator U f given by for each g ∈ H X .Again, the Koopman operator U f is a mapping from H X to the pullback space f * (H X ).It is linear and bounded, and boundedness follows from exactly the same argument by which the orbital Koopman operator U ϕ is shown to be bounded.As in our discussion of the orbital Koopman operator U ϕ , we use the same notation U f to denote the operator defined by the identity U f g := g • f for g ∈ R Ω (H X ).Again, this definition makes sense as long as the orbit Γ + (x 0 ) takes values in the subset Ω ⊂ X.This definition is used in cases in which the samples Ξ do not fill the state space X, but are dense in a proper subset Ω ⊂ X.By the same reasoning as outlined above, the operator norm ∥U f ∥ ≤ 1 as a mapping U f : R Ω (H X ) → f * (R Ω (H X )).

Samplings and Fill Distances
To begin our discussion of sampling, we treat the case that samples come from a single orbit Γ + (ϕ 0 ).For either of the systems above characterized by the continuous semigroups {S(t)} t∈T with either T = [0, ∞) or T = N 0 , we build approximations from a single orbit Γ + (ϕ 0 ) based on some set of distinct samples Ξ ⊂ X with Note that this definition does not require that t i → ∞ for the continuous time case, nor that the sampling is uniform in either case, although these will be the most common cases in applications.Approximations of functions and operators in the paper is achieved using the finite dimensional spaces H Ωn defined as The sets Ω n are nested with Ω n ⊂ Ω n+1 , and the approximant spaces are likewise nested, H Ωn ⊂ H Ωn+1 .Correspondingly, for the continuous time case we also define the finite sets of discrete sample times We denote by Ω the limiting set that is approximated by the samples in the sense that Ξ is dense in Ω, that is, Ξ ⊆ Ω and Ω ⊆ Ξ.Clearly, one candidate for the limiting set is given by but we also allow for open sets Ω.The rate of convergence will ultimately be expressed in terms of the fill distance that is defined for any finite set Ω n ⊂ Ω as It should be noted that in order that the fill distance is finite, it must be the case that Ω is bounded.In this paper when we say that the samples Ξ : n∈N fill up the limiting set Ω, we mean the fill distance approaches zero, d Ωn,Ω → 0 as n → ∞.
In general an orbit Γ + (x 0 ) of a semidynamical system need not have any easily or particularly notable structure that is readily characterized, and this fact will be reflected in the samples Ξ := Ξ(ϕ 0 ) extracted from Γ + (ϕ 0 ).Different error bounds for the Koopman operators in this paper will be derived depending on how much structure the set Ω exhibits.The following list of examples illustrate some of the situations that may be encountered in practice.
Samples Ξ are dense in the entire state space, Ω := X: In some sense this is one of the most basic situations that must be studied.Note that if we want h Ωn,Ω → 0 as n → ∞, we must have Ω bounded in this case, which puts limits on the choice of the state space X.The state space X cannot be a vector space, for instance, if we want the fill distance of finite sets to converge to zero.It makes sense then to focus on cases where the samples Ξ are dense in a compact state space and Ω := X.One canonical system of this type is Hamiltonian flow on the torus T 2 , since it is well-known that the trajectory t → ϕ(t) starting from any initial condition is dense in the torus.Since the torus T 2 is a compact submanifold of R 2 , it follows that, at least in principle, a collection of samples Ξ can be generated to fill the entire state space X = Ω = T 2 in the sense that h Ωn,Ω → 0 as n → ∞.Here we might be interested in the approximation of an observable function h : T 2 → R defined over the torus.More generally, this case includes flows that are dense in a compact manifold M = X, and a typical task would be the approximation of an observable function h : M → R defined over the manifold.
Samples are dense in an arbitrary subset Ω ⊂ X: Here, when we say arbitrary, we mean that Ω does not exhibit any readily characterized structure.An example of this case can be when Ω is an arc of the motion, that is, the range of the trajectory over some fixed time interval such as Example 2 this could correspond to the approximation of the Koopman operator over some "transient regime" of the motion.The function spaces for characterizing rates of convergence of approximations in this case must make sense for such subsets.For this class of problems, approximation error is measured in the norm of a native space H Ω ⊆ H X since the native space H Ω can be defined for any subset Ω ⊂ X, even those that have zero measure as subsets of X := R d .Error bounds in this case are expressed in terms of the norm of the complementary projection operators ∥(I − P Ωn )g∥ H Ω .If the rate at which ∥I − P Ωn ∥ → 0 is known, then we obtain rates of convergence of Koopman operator approximations.
Samples dense in regular subsets Ω ⊆ X or manifolds M ⊆ X: The last category did not require assumptions regarding the properties of the limiting subset Ω.However, when Ω exhibits more structure, stronger conclusions regarding convergence can be derived.When Ω is either a sufficiently regular subset Ω ⊂ X, or even a submanifold M ⊆ X, we choose to express errors in terms of Sobolev norms that are equivalent to R Ω (H X ) in some cases.Here regularity refers to the regularity of the domain, and the precise notion of what is needed is described later in the paper.The error bounds derived in this case are in terms of the fill distance h Ωn,Ω of the sets Ω n in the limiting set Ω.

Approximations of the Koopman Operator U f
In this section we carry out the details of the analysis of the errors in approximating the Koopman operators U f : H X → H X associated with the discrete dynamical system, and the orbital Koopman operator U ϕ for the continuous time dynamical system.We first study U f in this section since this case seems most relevant to a host of recent papers that study approximation of such Koopman operators.We subsequently derive error bounds for the orbital Koopman operator in Section 5, but only give brief outlines of how the proofs in Section 4 must be modified for the continuous time case.In both cases, we build up and refine error estimates based on how much we know about the structure of the limiting set Ω ⊂ Ξ.
Two general types of approximation are defined for the Koopman operator U f in this section.First, we define U n f := U f P Ωn with where it is assumed that f : X → X.An explicit representation of this operator is given by for each x ∈ X and g ∈ H X , with K −1 (Ω n ) the inverse of the Grammian matrix K(Ω n ) := [K(ξ m , ξ n ]) associated with the samples in Ω n .The above expression makes sense when the limiting set Ω = X.The above definition also makes sense if g ∈ R Ω (H X ), f : Ω → Ω, and t → ϕ(t) ∈ Ω ⊆ X, since in this case it is the coordinate expression for the data-driven approximation U n f := (P Ωn g) • f .This fact follows since the kernel that generates R Ω (H X ) is r := K| Ω×Ω , and the Grammian matrix R(Ω n ) := [r(ξ m , ξ n )] = K(Ω n ).This latter definition of U n f := P Ωn g • f is used to study cases when the sample Ξ fill Ω, but do not fill all of X.As discussed in Section 2.2, in this expression, the projections P Ωn are the orthogonal projections of R Ω (H X ) onto R Ω (H Ωn ) := span{r ξi | ξ i ∈ Ω n }.
Note that U n f can be applied to a function g only if f is known in closed form in either of the above definitions of U n f , so these approximations are not realizable in applications that seek data-driven approximations.Still, bounds on the error in approximating U f by U n f are of theoretical interest in their own right.They tell us upper bounds on the approximation rates that can be achieved from a given finite dimensional subspace H Ωn , if the function f is known.In this sense they describe limits on what is feasible or achievable in principle when using the subspaces H Ωn .Pragmatically speaking, these bounds are also directly applicable in two specific, relevant problems.These bounds can be used to derive rates of convergence of estimates for the reproducing kernel Hilbert (RKH) space embedding method, see Section III of [68].They are also important in that they are used to construct some error bounds for the data-driven approximations that follow.
For constructing data-dependent approximations of the Koopman operator, we discuss two different types of approximations, each of which makes a different type of structural assumption about the nature of the Koopman operator.In the first data-driven method, we assume (1) that the samples Ξ are dense in a limiting set M := X that is a type of smooth manifold, and we also require that (2) the Koopman operator U f is such that the pullback space f * (H X ) ⊂ H X .In this case we can define the approximation U n f := P Ωn ((P Ωn g) • f ).Since by definition P Ωn : H X → H X , this definition makes sense as long as f * (H X ) ⊂ H X because (P Ωn g) is then an element of H X .As we explain in Example 6, if g := 1≤j≤n c j K ξj , the operator U n f has the coordinate representation In fact, Example 6 shows that the operator above, which acts on functions, is a special case of that induced in the usual derivation of the EDMD algorithm in references such as [21,31,41,42] The definition of the approximation U n f makes sense provided that the pullback space f * (H X ) ⊆ H X .As useful as this assumption is, it may not be a simple condition to establish or verify.This fact motivates our second method, which is intended to be applicable in some cases when it is not clear that f * (H X ) ⊆ H X .In the second method, we assume that (1) the samples Ξ are dense in a limiting set Ω that is a sufficiently regular, proper subset of X := R d , and (2) the unknown function f ∈ H d X .In contrast to the approximations U n f defined above that make structural assumptions on the composite function U f g := g •f , in this case we make hypotheses directly about the function f .Data-driven approximations U n P Ωn f of the Koopman operator are generated from expressions like U n P Ωn f g := U P Ωn f P Ωn g := (P Ωn g) • (P Ωn f ).
For completeness, we record the explicit representations where the samples Ω n := {ξ i } 1≤i≤n = {ϕ i } 1≤i≤n .Note also that since by assumption the discrete evolution is given by y i := ϕ i+1 = f (ϕ i ), the data-dependent approximation U n P Ωn f g := U P Ωn f P Ωn g can be formed from the pairs of samples At the end of this section we combine the error bounds for U n f and U n P Ωn f to generate overall error estimates for the data-driven approximations of the Koopman operators.We build bounds on rates of convergence using the triangle inequalities that have the form with B a Banach space of functions.

The Approximations U n f
We start with some very straightforward results, ones that are not too surprising and hold without any particular assumptions about the structure of the limiting set Ω.

Approximations
The following theorem bounds the error in approximations U n f of U f in terms of the norms of the complementary projections I −P Ωn .In Theorem 2 that follows we give sufficient conditions to conclude the pointwise convergence Theorem 1 Suppose that the samples Ξ are dense in a bounded set Ω ⊆ X, there is a constant k > 0 such that the kernel K that defines the native space H X satisfies sup x∈X K(x, x) ≤ k2 , and let f : X → X.Then we have the error bound for all g ∈ H X .If furthermore the mapping f ∈ C(X, X) and g ∈ H Ω , we have the pointwise estimate for each x ∈ X.
Proof We want to derive a bound for the norm of the difference But by definition we have and we see that the operator norm ∥U f ∥ ≤ 1.From this line Equation 19 follows.By definition, the kernel on f * (H X ) is given by K f (x, y) := K(f (x), f (y)).Since sup x K f (x, x) ≤ k2 and and K f is continuous, we know that f * (H X ) ֒→ C(X).We can then write The norm ∥(I − P Ωn )R Ω g∥ R Ω (H X ) = ∥(I − P Ωn )g∥ H Ω → 0 as n → ∞ provided that we can show {P Ωn } n∈N converge to the identity operator on H Ω , or equivalently, {P Ωn } n∈N converge to the identity operator on R Ω (H X ).This is the topic of the next proposition and theorem.
So far, we have established that the error in the approximation of the Koopman operator can be bounded by ∥(I − P Ωn )g∥ In many of the theorems of the next few sections, precise bounds for ∥(I − P Ωn )g∥ H Ω are derived depending on the smoothness properties of the kernel K that defines H X and the regularity of the limiting set Ω.
In the situation at hand, it is assumed that the samples Ξ = n∈N Ω n are dense in Ω.But with so little known about the set Ω, the kernel K, and the RKH space H X , as of yet we do not even know that the set of functions Proving this fact is equivalent to proving that the operator norm ∥I − P Ωn ∥ → 0 when here we interpret P Ωn := P Ωn | H Ω as an operator on H Ω .Establishing this fact is carried out using the following proposition, which is implied by Propositions 1 and 2 of [58].
Proposition 1 Suppose that the admissible kernel K : X × X → R induces the RKH space H X and that the map x → K x ∈ H X is injective.Then d K : X × X → R given by d K (x, y) := ∥K x − K y ∥ H X is a metric on X, and the map x → K x is an isometry from (X, d X ) into H X .
We say the that native space H X separates a subset S ⊂ X if for each x ̸ ∈ S there is a function g ∈ H X such that g(x) ̸ = 0 and g(y) = 0 for all y ∈ S. If H X separates all the d X -closed subsets of X, then (X, d K ) and (X, d X ) are equivalent metric spaces.This theorem suffices to establish that {P Ωn } n∈N converges to the identity operator on H Ω , as summarized in the following theorem.
Theorem 2 Suppose that the kernel K : X × X → R satisfies the hypotheses of Proposition 1. Then {P Ωn } n∈N converges to the identity operator on H Ω .
Proof The theorem follows if we can establish that H Ω := span{K ξi | ξ i ∈ Ξ}.Since the metric spaces (X, d X ) and (X, d K ) are assumed to be equivalent, there are two positive constants c 1 , c 2 such that for all x, y ∈ X. Suppose that f ∈ H Ω and fix an arbitrary ϵ > 0. By the definition of H Ω , there is an integer n = n(ϵ), real numbers α n,i ∈ R, and locations x n,i ∈ Ω such that But since Ξ is dense in Ω, for each x n,i we can find an ξ n,ℓi ∈ Ξ such that We then have This means that {K ξi | ξ i ∈ Ξ} is dense in H Ω , and the proof is complete.
As noted above, the RKH space f * (H X ) is defined for any function f : X → X, and Theorem 2 holds even when and the limiting set Ω does not exhibit any particular structure.We next discuss a cases when the limiting set Ω exhibits some additional, but specific, kinds of structure.We begin with a discussion of the case in which the limiting set Ω is a smooth, connected, compact Riemannian manifold M := Ω, and the samples are dense in M .Three types of theorems are derived in this section below: the first uses the many zeros Theorem 11 on manifolds, while the latter two theorems are examples of error bounds that generate pointwise estimates from Theorems 12 and 13.While there are differences in the fine details of the hypotheses for these two theorems, it should be noted that all three theorems are quite similar qualitatively speaking.In all cases the errors in approximations of the Koopman operator are bounded by powers of the fill distance h r Ωn,M , and the exponent r depends on the smoothness of the functions in certain Sobolev spaces.Note also that all three of these theorems can be applied when the samples fill the entire state space X, that is, when M := Ω := X.We begin with a result that depends on the many zeros Theorem 11 in a manifold M .Theorem 3 Suppose that M is a d-dimensional, connected, compact, Riemannian manifold without boundary, let K : M × M → R be a positive definite kernel that induces a native space H M , and suppose that H M is equivalent to the Sobolev space W t,2 (M ) for some t ∈ R that satisfies d/2 < s ≤ ⌈t⌉ − 1 for a given s ∈ N. Then there are constants C M , h M > 0 such that for all Ω n ⊂ Ω that satisfy h Ωn,Ω ≤ h M we have Proof Since s > d/2, the Sobolev embedding theorem implies that W s,2 (M ) is a RKH space, and therefore the pullback space f * (W s,2 (M )) is well-defined.By the definition of the pullback space we have As discussed in Section 2.2, the operator norm of U f , as a mapping from the RKH space W s,2 (M ) into f * (W s,2 (M )), is less than or equal to one.But on Ω n , we know that ((I − P Ωn )g)| Ωn = 0 since the projection is identical to the interpolant over Ω n .By the many zeros Theorem 11, we conclude that M ) .Now we turn to a pair of theorems that ensure pointwise convergence rates over manifolds.
Theorem 4 Suppose that the collection of samples Ξ is dense in a compact, connected, C ℓ Riemannian manifold M := Ω without boundary, the kernel K that induces H M is positive definite, the kernel K ∈ C 2k (M × M ), and ℓ ≥ 2k.Then there exist constants h M , C M > 0 such that for all g ∈ H M and Ω n ⊂ M with h Ωn,M < h M we have Proof By definition we have The last line in the above sequence of inequalities follows from the approximation Theorem 13.
Theorem 5 Suppose that the collection of samples Ξ is dense in a d−dimensional, compact, connected, smooth Riemannian manifold M := Ω without boundary, the kernel K that induces H M is positive definite, and H M is equivalent to a Sobolev space W t,2 (M ) for some t > d/2.Then there exist constants h M , C M > 0 such that for all g ∈ H M and Ω n ⊂ M with h Ωn,M < h M we have Ωn,M ∥g∥ H M for all x ∈ M .Proof The proof of this theorem proceeds exactly as in the last Theorem 4, but by replacing the last line of the proof to use Theorem 12 instead of Theorem 13.

Approximations
The last two theorems are designed to apply to cases when the samples Ξ fill up the entire manifold M := Ω in the sense that d Ωn,M → 0 as n → ∞.We now consider cases when the samples Ξ are dense in a limiting set Ω that is a proper subset of the state space X := R d .
Theorem 6 Suppose that the collection of samples Ξ is dense in the compact subset Ω ⊂ X := R d that has a Lipschitz boundary, the kernel K induces the RKH space H X of functions over X, there is a constant k such that Then there exists constants h M , C M > 0 such that for all g ∈ W t,2 (Ω) and subsets Ω n with h Ωn,Ω) ≤ h M we have If, furthermore, f ∈ C(Ω, Ω), we have the pointwise estimate Proof From the definition of the pullback space f * (W s,2 (Ω)) and the many zeros Theorem 10, we see that (Ω) .The many zeros Theorem 10 can be applied since the projection P Ωn g is identical to the interpolant of g on Ω n , so that (I − P Ωn )g| Ωn = 0. Since ∥U f ∥ ≤ 1 and ∥I − P Ωn ∥ ≤ 1, the inequality in Equation 21 follows.The estimate above holds for any function f : Ω → Ω. If, in addition, f is continuous, then we have Since K f is the kernel of the RKH space R Ω (H X ), this is sufficient to ensure that f * (W s,2 (Ω)) ֒→ C(Ω).We then have , and the proof is complete.
Before proceeding to our discussion of data-dependent approximations, a few observations are in order to emphasize some of the nuances that distinguish Theorem 6 from its predecessors Theorems 4 and 5.The most obvious difference is, of course, that in Theorems 4 and 5 the samples are dense in a compact Riemannian manifold M , which allows for the possibility that the entire state space X of the underlying flow is a compact manifold M := X := Ω.Note that the bounds on the pointwise error in this case are specified relative to the norm ∥g∥ H M , and the operator U f defined as U f g := g • f acts on functions g that are defined over all of the state space X := M = Ω.On the other hand, in Theorem 6 the samples are dense in the proper subset In other words the operator U f in this theorem is understood as an operator U f g := g • f for g ∈ W s,2 (Ω).Note that in this interpretation the operator U f is an operator on functions supported on the proper subset Ω ⊂ X := R d , in contrast to the operator U f in Theorems 4 or 5 that acts on functions supported on the full state space X := M .
In writing the original discrete evolution law in Equation 1, we have stipulated only that f : X → X. Connecting the error estimates in Theorem 6 to the original evolution law would require some additional assumptions about the underlying flow and the limiting set Ω ⊂ X := R d .For instance, suppose that we know that Ω is a positive invariant set for the flow, meaning that ϕ n ∈ Ω implies that ϕ n+1 ∈ Ω.Under this assumption it must be the case that f (Ω) ⊆ Ω, and the restricted function In this situation Theorem 6 can be understood as a statement about the approximation of the Koopman operator associated with the restriction f | Ω from the samples in Ξ that are dense in Ω.In this section we explore how the projection approximations U n f g := (P Ωn g) • f introduced above are used to construct error estimates for the data-dependent approximations U n f and U n P Ωn f .

Data-Dependent Approximations
In this example we review the EDMD method since it is one of the most popular and well-known algorithms that is studied in the context of Koopman theory.We will see that the data-driven approximation U n f g := P Ωn ((P Ωn g) • f ), which is defined in terms of projections P Ωn over the finite dimensional spaces H Ωn in an RKH space, has a coordinate representation that is a special case of that defined in terms of the classical EDMD algorithm.In the last few years, the notation used to present the EDMD algorithm has evolved into a kind of common convention (see for example [21,31,41,42]) and we review in this section this notation.Suppose we have collected the m samples {ϕ i , y i } 1≤i≤m of the discrete dynamics described in the canonical Equation 1.The output is defined as y i := f (ϕ i ).We define the matrices Φ and Y of inputs and outputs, respectively, as In the notation that has become standard in descriptions of the EDMD algorithm, we define the i th basis function ψ i := K ξ ∈ F n := span{ψ i |1 ≤ i ≤ n}, introduce the vector of basis functions as ψ(x) := {ψ 1 (x), . . ., ψ n (x)} T ∈ R n×1 , and set the data matrices The EDMD algorithm, viewed in terms of linear algebra, solves the for the matrix A n,m ∈ R n×n that satisfies the minimization problem with ∥ • ∥ F the Frobenius norm on matrices.The solution of this minimization problem is given by the matrix A n,m := Ψ (Y )Ψ + (Φ) with Ψ + (Φ) the Moore-Penrose pseudoinverse of the matrix Φ.Finally, the EDMD algorithm defines a data-driven approximation U n,m of the Koopman operator U f that is given by for each g := 1≤i≤n c i ψ i ∈ H Ωn .It is not difficult to relate the data-driven approximation U n f g := P Ωn ((P Ωn g)• f ) to U n,m .By definition U n f : H X → H Ωn , and U n,m : H Ωn → H Ωn , and since these two operators have different domains, they cannot be the same operator.However, in some circumstances, we do have To establish this relation, we choose the finite dimensional space F n := H Ωn , the basis ψ i := K ξi , for ξ i ∈ Ω n with 1 ≤ i ≤ n, and set the number of samples m equal to the dimension of the approximant space H Ωn , m = n.
The equivalence is immediate when we note that and compare the coordinate expression above for U n,m to that of U n f in equation 18.In view of these relationships, we can view one contribution of this paper as defining a specific choice of the bases in the EDMD algorithm that enables precise rates of convergence over some important cases of regular subsets and smooth manifolds.These error estimates are given in terms of the fill distance of samples in the manifold.This result has no precedent in either the study of approximations of Koopman operators or the EDMD algorithm.
Theorem 7 Suppose that the hypotheses of Theorem 11 hold.Furthermore, suppose that the mapping f : M → M is such that the pullback space f * (H M ) ⊆ H M .Then there are constants C M , h M > 0 such that for all Ω n ⊂ Ω that satisfy h Ωn,Ω ≤ h M we have the pointwise bound Under the hypotheses of this theorem, we have This means that Now we apply the many zeros Theorem 3 to each of the right hand side terms above.We know that since the projection operator P Ωn is identical to the interpolation operator on Ω n ⊂ M .By the many zeros theorem on the manifold M , we can write

Data-Dependent Approximations
We next turn to the study of data-dependent approximations (U n P Ωn f g)(x) := (P Ωn g) • (P Ωn f )(x) for each x ∈ Ω ⊂ X := R d .We focus in this section on the case when the samples Ξ are dense in a sufficiently regular set Ω ⊂ X := R d .We will show that effective error bounds on the convergence rate are derived if, in addition to the hypotheses outlined in Theorem 6, we assume additionally that the RKH space R Ω (H X ) is contained in the space of Lipschitz continuous functions.Theorem 8 Suppose that hypotheses of Theorem 6 hold, additionally that we have the continuous embedding R Ω (H X ) ≈ W t,2 (Ω) ⊆ W s,2 (Ω) ֒→ C 0,1 (Ω), and f ∈ (W t,2 (Ω)) d .Then there are constants h Ω , C Ω > 0 such that for all g ∈ W t,2 (Ω) and Ω n ⊂ Ξ such that h Ωn,Ω < h Ω we have 2 (Ω) ֒→ C 0,1 (Ω), we begin by writing the inequality Since W s,2 (Ω) ֒→ C(Ω), we know that (W s,2 (Ω) Ω)) d .Now the desired result follows from an application of the many zeros Theorem 10, applied to each component of the vector-valued function f .Before concluding this section, we note that the results to Theorems 6 and 8 can be combined to obtain the total error bound with CΩ the constant from Theorem 6 and C Ω the constant from Theorem 8.
5 Approximation of the Orbital Koopman Operator U φ : With the analysis in Section 4 of the Koopman operator U f : H X → f * (H X ), the study of approximations of the orbital Koopman operator U ϕ : H X → ϕ * (H X ) proceeds similarly, but with a few caveats.Differences arise owing to the fact that ϕ : T → R d is a function of time, and correspondingly the pullback spaces ϕ * (H X ) or ϕ * (R Ω (H X )) contain functions of time.Approximations of the Koopman operator U f are expressed in terms of the pullback spaces f * (H X ), or f * (R Ω (H X )) that consists of a collection of spatial functions defined over X.
As before, we construct two types of general approximations to U ϕ .We set which is entirely analogous to the definition of the U n f in Equation 17.The explicit realization of U n ϕ is written as for each t ∈ T and g ∈ H X .Just about all of the comments and qualifications made about the approximation U n f of the Koopman operator U f can be made regarding the approximation U n ϕ of the orbital koopman operator U ϕ .The coordinate expression above for U n ϕ := P Ωn g • ϕ coincides with that for the approximation that is defined U n ϕ := P Ωn g • ϕ in the case that g ∈ R Ω (H X ) and t → ϕ(t) ∈ Ω ⊂ X.This latter definition is used when samples Ξ are dense in Ω, but not dense in the entire state space X.Again, in either interpretation the approximation U n ϕ cannot be realized in practice unless a closed form expression for the trajectory t → ϕ(t) is known.Still, error estimates for U n ϕ are useful for understanding the ideal performance of U n ϕ in the event that t → ϕ(t) is known.Also, we briefly outline below the fact that error bounds for approximations U n ϕ of the orbital Koopman operator U ϕ are entirely analogous to those for approximations U n f of the Koopman operator U f .On the other hand, data-driven approximations of the orbital Koopman operators U ϕ can be more complicated than those for U f .The data-driven approximation of U ϕ g = g • ϕ requires an approximation of the spatial function g, and an approximation of the time-dependent function t → ϕ(t).
To construct approximations of the time-dependent function, we introduce an additional RKH space H T induced by an admissible kernel p : T × T → R, Up until this point, we have assumed that T = [0, ∞), but now we (again) overload this notation and assume that T ⊂ [0, ∞) is a bounded subset contained in [0, ∞).We define to be the finite set of times corresponding to samples in Ω n ⊂ Ω, and we denote by the finite dimensional approximant spaces contained in H T .The construction of approximations of time-varying functions over the bounded set T is carried out using the H T -orthogonal projections Q Tn : H T → H Tn , which again coincide with the unique interpolation operator over H Tn ⊂ H T .We overload the notation and extend the definition of the projection operator Q Tn so that it applies entrywise to vector-valued functions, We limit our discussion of the data-driven approximation of the Koopman operator U ϕ to the case that the samples Ξ are dense in a proper subset Ω ⊂ X := R d , but are not dense in the entire state space.The data-driven approximation U n Q Tn of the orbital Koopman operator U ϕ g := g • ϕ with g ∈ R Ω (H X ) is now defined to be U n Q Tn ϕ := U Q Tn ϕ P Ωn g := (P Ωn g) • (Q Tn ϕ) for each t ∈ T ⊂ [0, ∞).This operator has the realization with the Grammian matrix P(T n ) := [p(t i , t j )] ti,tj ∈Tn .By inspection, this coordinate expression can be formed in terms of the samples

Approximations U n ϕ
From the similarity of the definitions of the approximation U n ϕ of the orbital Koopman operator U ϕ to that of the approximation U n f of U f , it is not surprising that the associated error bounds are derived in nearly identical fashion.Each of the Theorems 1, 4, 5, and 6 are readily transcribed to generate error bounds for U n ϕ by replacing , and f * (W s,2 (Ω)) ⇝ ϕ * (W s,2 (Ω)).With these changes the pointwise estimates for x ∈ Ω are also replaced by pointwise estimates for t ∈ T .We do not write modifed versions of Theorems 1 through 6 since their form and proofs would be largely redundant.Below we just summarize these results and leave the details to the reader.
For instance, if the hypotheses of Theorem 1 hold, we conclude that ∥U ϕ g − U n ϕ g∥ ϕ * (H X ) ≤ ∥(I − P Ωn )g∥ H X for all g ∈ H X , and further if ϕ : T → R d is continuous that we have the pointwise bound If the hypotheses of Theorem 4 apply, we conclude that M ∥g∥ H M for each t ∈ T , while the left hand side is bounded by Ωn,M ∥g∥ H M if the hypotheses of Theorem 5 are true.Finally, the analysis in Theorem 6 can be extended to the case at hand to show that Ω) , for all g ∈ W s,2 (Ω), under the suitably modified hypotheses of that theorem.Using the fact that the pointwise error in time is less than The derivation of error bounds for the data-dependent approximations U n Q Tn ϕ is modeled on the approximations in Theorems 7 and 8 above, but with changes to account for the approximation of g and ϕ in different finite dimensional spaces of functions.
Theorem 9 Suppose that the set of samples Ξ is dense in the set Ω ⊂ X := R d , the set of times n T n is dense in the compact set T ⊂ R + , the kernel K : X × X → R induces the native space H X over X, g ∈ R Ω (H X ) ֒→ C 0,1 (Ω), and the admissible kernel p : T × T → R induces the RKH space H T ≈ W r,2 (T ) ֒→ W u,2 (T ) with 1/2 ≤ u ≤ ⌈r⌉ − 1.Then there are numbers C T , h T > 0 such that for all T n ⊂ T such that h Tn,T < h T and ϕ ∈ (W r,2 (T )) d we have Ω) with d/2 ≤ s ≤ ⌈t⌉ − 1 and Ω ⊂ X is a compact set with a Lipschitz boundary, we have the total error estimate for each t ∈ T .
Proof We start as in the proof of Theorem 8 by writing Now the results follows the same argument as in the proof of Theorem 6.We known that the H T -orthogonal projection onto H Tn coincides with the interpolant on the subspace H Tn , so that (I − Q Tn )ϕ is equal to zero when restricted to T n .We apply the many zeros Theorem 10 to each entry to obtain the desired inequality.Using the fact that the error (U , for all g ∈ W s,2 (Ω).Using triangle inequality, we arrive at the second result of the theorem.

Analytical And Numerical Examples
Example 7 In this paper we have studied the orbital Koopman U ϕ operator for some semidynamical systems in continuous time, and we have noted that it is not the usual Koopman operator defined for time dependent systems.In this example we give one particular example that motivates the study of the orbital Koopman operator.In a series of papers by the authors [56,73] the authors have investigated the RKH space embedding method for adaptive estimation of systems that are governed by uncertain nonlinear ODEs that have the form where A ∈ R d×d is known, B ∈ R d×1 is known, and the function f : R d → R unknown.It is shown in [56] that one form for the estimator equations of the RKH embedding method can be written as with φ(t) ∈ R d the state estimate of ϕ(t), the function f (t) ∈ H X an estimate of f , U ϕ : H X → ϕ * (H X ) the orbital Koopman operator, and E t : ϕ * (H X ) → R the evaluation operator at t ∈ R + .These equations define an evolution in the infinite dimensional space R d × H X , so they define an example of a distributed parameter system.As discussed in [56], approximate trajectories are generated by replacing U ϕ with the approximations U n f or U n ϕ .The results of this paper can therefore be interepreted as a source of ways for generating estimates of convergence rates for the RKH embedding method.

Example 8
In this example we return to Example 1 and study approximations U n ϕ of the orbital Koopman operator associated with the orbit Γ + (ϕ 0 ) of the trajectory t → ϕ(t) ∈ R 2 through the initial condition ϕ 0 = {−0.2,1.6} T ∈ R 2 .We define the limiting set Ω to be an arc of the motion over the open interval We define a set of samples Ξ := n∈N Ω n where for each n ∈ N we choose Note that we have assumed that all of the finite samples have been selected from the one trajectory t → ϕ(t) that passes through the initial condition ϕ 0 ∈ R d .We allow (but do not require) that the discrete sample sets Ω n are nested, so that Ω n ⊂ Ω n+1 .We require, however, that the fill distance h Ωn,Ω → 0 as n → ∞.Suppose that we choose the kernel K : R 2 × R 2 → R that is defined over the entire real plane.For instance, we can choose K to be Sobolev-Matern kernel K m with m > 1, the Abel kernel K α,1,2 , the ℓ 1 -exponential kernel K α,1,1 , or the Gaussian kernel K α,2,2 , with all of these kernels defined on X × X := R 2 × R 2 .Whatever the specific choice of the kernel K, it defines the RKH space H X of functions over the state space X := R 2 : functions in H X are defined on all of X := R 2 .We also define finite dimensional spaces of approximants With these definitions, we know that all contain functions that are defined over all of the state space X := R 2 .From Theorem 1, we know that It is trivial that for any of the above choices of the kernel K, the map x → K x ∈ H X is one-to-one.
If we can additionally ensure that the kernel K separates all the d X := d R 2 -closed subsets of X := R 2 , then we can conclude from Theorem 2 that ∥(I − P Ωn )g∥ H Ω → 0 as n → ∞ for each g ∈ H Ω .As noted in [58], the Abel kernel K α,1,2 and the ℓ 1 -exponential kernel separates every closed subset of R 2 , so both of these kernels generate convergent approximations in H Ω .Also, the Matern-Sobolev kernel for m > d/2 = 1 induces a native space H X that is equivalent to the Sobolev space W m,2 (R 2 ).But as discussed more fully in [74], the Sobolev space W m,2 (R 2 ) contains C ∞ 0 (R 2 ), and therefore it has a rich family of smooth bump functions.The Sobolev space W s,2 (R 2 ) therefore separates all the d R 2 -closed subsets of X = R 2 .This implies the convergence of the Koopman approximations in H Ω in this case also.However, it is known that the Gaussian kernel does not separate all the -0.Fig. 2: From initial condition ϕ 0 = {−0.2,1.6} T , the input trajectory ϕ(t) is generated over 7 ns.Two estimates are generated using two different collection of centers Ω n1 and Ω n2 , which are placed quasi-uniformly along the orbit from times t 1 = 1 ns to t 2 = 6 ns.
d R 2 -closed subsets of X := R 2 .[58] We would require additional analysis to conclude convergence in this case since Theorem 2 does not apply.Here we examine the numerical results of approximating a function g(t) = sin(ϕ 1 (t) + ϕ 2 (t)) 3 .Using the Matern-Sobolev kernel with m = 3/2.As mentioned in [52], measuring time in nanoseconds and the currents ϕ 1 and h(x 1 ) in mA with parameters, C,L, and u as given in Example 1, the equations of motion for the state dynamics are given by φ(t) = 0.5(−h(ϕ As mentioned in [52], this system has two stable and one unstable equilibrium.From initial condition ϕ 0 , the trajectory converges to the stable equilibrium at {0.884, 0.21} T .The simulation is run over 7 ns.Two estimates of U ϕ g are calculated.The first and second estimates uses 3 and 13 kernel centers, respectively, spaced quasi-uniformly along the orbit.Both estimates begin placing centers from times t 1 = 1 ns to t 2 = 7 ns as demonstrated in Figure 2. Figure 3 illustrates the behavior of the estimates U n1 ϕ and U n2 ϕ of the orbital Koopman operator U ϕ acting on g.From the figure, it is clear that the error between the estimate and the true operator acting on g is low near the centers placed between t 1 and t 2 .Additionally, we can see that the estimate U n2 ϕ g using the larger number of centers has a much smaller error than U n1 ϕ g since h Ωn 2 ,Ω < h Ωn 1 ,Ω .To better understand the effects of the fill distance on the reduction in error, we examine the bounds on the error as given by orbital Koopman operator's version of Theorem 6 (see Section 5).In this case, the dimension of the ambient space is d = 2, and the dimension of the restricted set along the orbit is k = 1.For each estimate, kernels are placed quasi-uniformly throughout the orbit from the initial time of t 1 = 0 to the final run time of t = 9 ns.When using Sobolev-Matern kernel of order m = 3/2 > 1 and choosing s = m − (d − k)/2 > 0 and s ≤ ⌈t⌉ − 1, we can bound the error with a reduction in the fill distance h Ωn,Ω as seen in Figure 4.

Example 9
We next turn to the study of the Lotka-Volterra Equations in Example 3, which we interpret as an example that determines a flow on a manifold M .We prepare this example with a careful discussion of a continuous semiflow that can be associated with the governing equations.
In writing the original equations, we are given a set of nonlinear ODEs over the state space X := R 2 .It is relatively straightforward to show that every initial condition ϕ 0 in the open first quadrant generates a continuous solution of these equations, one that is defined for all t ∈ R + .The solutions depend continuously on the initial conditions.These properties suffice to determine a continuous semigroup {S(t)} t∈R + in the usual way [75] from the ODEs, provided we can define a suitable domain of initial conditions.We turn to this question next and seek to characterize a positive invariant set contained in the open first quadrant.There is a single equilibrium in the open first quadrant located at ϕ e := {1, 2}.Moreover, it is also known and easy to verify [53] that the trajectory t → ϕ(t) passing through an initial condition ϕ 0 in the open first quadrant leaves the function Fig. 3: An illustration of the behavior of the estimates U n1 ϕ and U n2 ϕ of the orbital Koopman operator U ϕ acting on g.The error between the estimate and the true operator acting on g is smaller near the centers placed between t 1 and t 2 .From the figure, it is also evident that the error in the estimate U n2 ϕ g is smaller than that of the U n1 ϕ g since h Ωn 2 ,Ω < h Ωn 1 ,Ω .V defined by That is, we have V (ϕ(t)) := V (ϕ 0 ) for all t ∈ R + for any initial condition in the open first quadrant.We can define a compact set N ⊂ R 2 as the closed sublevel set for any constant α, and this set is invariant under the solutions of the ODEs.The fact that N is closed follows from the fact that it is the inverse image of a closed set under a continuous mapping.If we set S(t)ϕ 0 := ϕ(t) for each initial condition ϕ 0 ∈ N , we obtain a continuous semigroup {S(t)} t∈R + over the compact, closed (complete) metric space (N, d N ) := (N, d R 2 ).Here d R 2 is the usual Euclidean distance function restricted to the set N .
Next we define how a dynamical system on a a manifold M is defined from the semiflow {S(t)} t∈R + on N .The Jacobian of V is given by DV and the determinant |DV | is nonzero for every point in the open first quadrant except the equilibrium ϕ e , where it is zero.By the implicit function theorem this means that the set M ⊂ N , where M := M (ϕ 0 ) := {x ∈ R 2 | V (x) = V (ϕ 0 )} with ϕ 0 ∈ N and ϕ 0 ̸ = ϕ e is a smooth, connected, regularly embedded submanifold of R 2 .Any smooth manifold can be equipped with a Riemannian metric, and here we just assume that such a metric has been chosen.We define the mapping T (t) : M → M by restriction, T (t) := S(t)| M .The operators {T (t)} t∈R + satisfy the algebraic semigroup properties since the semigroup {S(t)} t∈R + does and M is invariant under {S(t)} t∈R + .Finally, since M is a regularly embedded submanifold of R 2 , we know that there are two constants C 1 , C 2 > 0 such that for all x, y ∈ M .[59] The fact that the metric d M on M is equivalent to the usual Euclidean metric restricted to M suffices to show that {T (t)} t∈R + is a continuous semiflow on the complete metric space (M, d M ).Of course, it is clear from this analysis that the manifold M is just the orbit Γ + (ϕ 0 ) equipped with a Riemannian metric.We could use the definition of the semiflow {T (t)} t∈R + above to define the orbital Koopman operator U ϕ for any initial condition ϕ 0 ∈ M , and apply our approximation results to the associated orbital Koopman operator U ϕ .However, we have already carried this out for an orbital Koopman operator in the last example, and now we consider the case of a discrete flow defined in terms of a mapping f : M → M and approximate the usual Koopman operator U f .We can associate a discrete flow with the semigroup {T (t)} t∈R + on M by sampling.Suppose that t i := (i − 1) * h for some fixed time step h > 0. We set with f (ϕ) := T (h)ϕ, and we obtain a discrete evolution of the type studied in Equation 1, but now the system evolves on a manifold M .By construction the function f : M → M .Note that even though the governing ODEs constitute a classical case study, the function f : M → M is not known in closed form.In this sense, the approximation of the Koopman operator U f is a nontrivial question even for this rather straighforward system.In the remainder of this example we employ a intrinsic method of approximation.We choose a kernel K : M × M → R, one that is defined explicitly over the manifold, to construct approximations of the Koopman operator.
We will estimate the Koopman operator in terms of the Sobolev spaces W m,2 (M ).Recall that if the dimension of M is k, then W m,2 (M ) is continuously embedded in C 0 (M ) if m > k/2.Since M is a one dimensional manifold, choosing m > 1/2 will suffice to guarantee that W m,2 (M ) is a RKH space.We choose the Sobolev-Matern kernel K m : M × M → R for some m > 1/2.For some standard manifolds, the Sobolev-Matern kernels are known in closed form.For the problem at hand, since the manifold M is not one of the standard examples, determining the Sobolev-Matern kernel is difficult.We must solve for the fundamental solution of the order 2m elliptic differential operator L := m j=0 (∇ j ) * ∇ j , which is the m th order Laplace-Beltrami operator on the manifold M .In this equation ∇ is the covariant derivative of the Riemannian manifold.On solving for the Sobolev-Matern kernel K m : M × M → R, we can apply Theorem 7 to find that the pointwise error is bounded like Before proceeding to the next example, we emphasize some key points about this intrinsic approximation method.To realize this rate of convergence in applications, we would first have to solve for the Matern-Sobolev kernel over the manifold M .Even though we know a great deal about the manifold M in this case, like the fact that it is a level set of a smooth function, the solution for the Matern-Sobolev kernel is nontrivial since the volume measure over the manifold is difficult to derive.In other problems the situation can be much worse.The closed form representation of the manifold M is usually not known.In such cases it is impossible to construct the Sobolev-Matern kernels over a manifold M , even in principle.Even if the manifold is known, the calculation of the solution of the Matern-Sobolev kernels can be prohibitively complicated.This would amount to 1) finding an explicit (finite) atlas for the manifold, 2) using the coordinate charts to express the representations of the Laplace-Beltrami operator in each chart domain, 3) and solving the coordinate expressions for the kernel in each chart domain.This is a highly nontrivial exercise in all but the simplest domains.
This would appear to be a significant barrier to using the bounds derived in the paper in some important (data-driven) applications: completing these calculations might require heroic efforts.Fortunately, we discuss an extrinsic example next which does not suffer from these issues.
Example 10 From the last example, it is clear that the theorems derived in the paper in principle define strong bounds on the error in approximations of different types of Koopman operators.However, it is also clear that coming up with realizations of the kernels that achieve these rates of convergence can be problematic.Using the same equations as the previous example, we now describe an extrinsic method for constructing Koopman approximations that realize the error bounds described in the paper.The discrete dynamics defined in terms of the function f := T (h) : M → M gives rise to the Koopman operator U f .The manifold M is a smooth, one dimensional, closed, connected, Riemannian manifold that is regularly embedded in the Euclidean space X := R 2 .It turns out that, since X := R 2 is such a common and widely used manifold, the Sobolev-Matern kernel K m of order m over X := R 2 is well known.The Sobolev-Matern kernel of order m over X := R d is given in [59] as with B m−d/2 the Bessel function of order m − d/2.This kernel is easily calculated for any x, y ∈ R d , and in particular it can be computed over a nontrivial or unknown submanifold M of X := R d since it is known in closed form on the larger set X.
As discussed in Section 2.2, the restricted kernel r M (x, y) := K m | M ×M (x, y) defines the kernel for the RKH space R M (H X ).We want to choose the kernel K m with sufficiently high order to ensure that R M (H X ) is equivalent to a Sobolev space W s,2 (M ) for an appropriate degree of smoothness s > 0. Recall that if the kernel K m over X := R d has a Fourier transform that satisfies the algebraic decay condition of order m in Equation 26with m > d/2, then H X ≈ W m,2 (R d ).But the Sobolev-Matern kernel K m defines the Sobolev space W m,2 (R d ): it automatically satisfies the decay condition in Equation 26 by definition.Moreover, when m > d/2, the Sobolev embedding Theorem ensures that W m,2 (R d ) ֒→ C 0 (R d ) and the trace map T is just the restriction map on these continuous functions.Recall from Proposition 2 that the trace (or restriction) map is a continuous mapping of W m,2 (R d ) onto W s,2 (M ) when s := m − (d − k)/2 > 0. With all of these observations, we conclude that if m > d/2 and s = m − (d − k)/2 > 0, then R M (H X ) ≈ W s,2 (M ).For the problem at hand this means that we must choose m > 1/2 and s = m − 1/2 > 0.
For this example, the trajectory was generated from an orbit starting at initial conditions ϕ 0 = {2, 2} T over 12.8 seconds.The approximation was constructed using the Sobolev-Matern kernel of order m = 5/2.The dimension of the approximation space is determined by the number of kernels n = 11, which are selected along the manifold quasi-uniformly.Figure 5 shows the approximation U n f and Koopman operator U f acting on a function g over the input defined by the Lotka-Volterra dynamics.From the figure, it is clear that the error is minimal along the manifold M where samples are collected.We also examine the rates of convergence given by Theorem 7 in Figure 6.The number of centers range from n = 5 to n = 26 based on the desired fill distance.The error is bounded by a function of the fill distance, which decreases as the kernel centers are increased.

Example 11
In this example we study the Van Der Pol oscillator described in Equations 2. First, the governing system of ODEs in Example 2 generate a continuous, bounded solution through any initial condition ϕ 0 ∈ X := R 2 , and the solutions depend continuously on the initial conditions.Standard methods in dynamical system theory [72,75] ensures that the governing equations define a continuous semigroup {S(t)} t∈R + on the entire state space X := R 2 .The positive limit set ω + (ϕ 0 ) is same for any choice of initial condition ϕ 0 ∈ X.That is, for any pair ϕ 0 , ψ 0 ∈ X, we have ω + (ϕ 0 ) = ω + (ψ 0 ).Since the forward orbit is precompact in X := R 2 , the positive limit set ω + (ϕ 0 ) is compact, connected, and positively invariant.Moreover, following essentially the same set of steps that are taken in Example 9, we can likewise show that M := ω + (ϕ 0 ) is a smooth, one dimensional, regularly embedded submanifold of X := R 2 , and that the restricted semigroup T (t) := S(t)| M is a d M -continuous semigroup over (M, d M ).Similar bounds as derived in Example 9 could be established in the current case, for any initial condition ϕ 0 ∈ M .However, in this example, we want to explore the case when initial conditions are outside of the positive limit set and generate substantially different trajectories.We are interested in precise descriptions of relationships between Koopman approximations that arise from samples Ξ(ϕ 0 ) and Ξ(ψ) that are collected from different orbits Γ + (ϕ 0 ) and Γ + (ψ 0 ), when ϕ 0 ̸ = ψ 0 and the initial conditions is outside of M := ω + (ϕ 0 ) = ω + (ψ 0 ).
As in Example 9, we induce a discrete dynamical system by discretization of the semiflow {S(t)} t∈R + , with ϕ n+1 = S(h)ϕ n = f (ϕ n ) and f : X → X an unknown mapping.We subsequently define U f g := g •f in the usual way for g ∈ H X .We denote by ω + f (ϕ 0 ) and ω + f (ψ 0 ) the positive limit sets for the discrete flow to distinguish them from the positive limit set M of the semigroup {S(t)} t∈R + .We know that the positive limit sets of the discrete flows are contained in that of the semigroup in continuous time, Note that the limiting set of the two collections of samples are given by We denote by U n f,ϕ0 g := P Ω ϕ 0 ,n g • f, U n f,ψ0 g := P Ω ψ 0 ,n g • f the projection-based approximations generated by the samples Ξ(ϕ 0 ) and Ξ(ψ 0 ), respectively.It is then immediately apparent that ∥U n f,ϕ0 g − U n f,ψ0 g∥ f * (H X ) ≤ ∥U f ∥ ∥(I − P Ω ϕ 0 ,n )g∥ H X + ∥(I − P Ω ψ 0 ,n )g∥ H X , since both U n f,ϕ0 and U n f,ψ0 are maps from H X into f * (H X ).Now we can apply Theorem 2 to conclude that ∥U n f,ϕ0 g − U n f,ψ0 g∥ f * (H X ) → 0 as n → ∞ for each g ∈ H Ω ϕ 0 Ω ψ 0 .In other words, the projection-based approximations of the Koopman operator asymptotically agree for all functions g ∈ H Ω ϕ 0 Ω ψ 0 .Not only does this agree with our intuition, but it describes explicitly the space in which approximations converge.

Example 2 Figure 1 ,
(b) depicts trajectories for several initial conditions of the flow that is governed by the Van Der Pol oscillator equations A flow having little apparent underlying structure.The positive limit set is a collection of attractive equilibria circled in red.Here samples are taken along a few trajectories during transient regimes.
A flow exhibiting considerably more structure.Each orbit generated by an initial condition in the upper right quadrant is an embedded submanifold of R 2 .

Fig. 1 :
Fig. 1: A collection of flows exhibiting various kinds of underlying structure.(a) Phase portrait of Example 1, from [52] page 46, Example 2.1, (b) Phase portrait of Example 2, the Van Der Pol oscillator, (c) Phase portrait of Example 3, the Lotka-Volterra predator-prey equations, [53] page 4, and (d) Phase portrait of simple pendulum dynamics and samples that are taken from a discrete numerical integrator with an invariant flow along a manifold defined by the dynamics.

4. 2
Data-Dependent Approximations U n f and U n P Ωn f

Fig. 4 :
Fig. 4: An illustration of the convergence of the estimate U n ϕ to the orbital Koopman operator U ϕ acting on g.Here the error is given by sup t∈T |U ϕ g(t) − U n ϕ g(t)|.The dashed line represents the convergence rate.

Fig. 5 :
Fig. 5: A comparison between the approximation U n f g and the infinite-dimensional Koopman operator U f g over the manifold M of the underlying Lotka-Volterra dynamics.Here we can see the approximation U n f g represented by the blue mesh intersects the Koopman operator U f g represented by the orange surface at the points in red collected above the manifold.

Fig. 6 :
Fig. 6: Examining the rates of convergence given by Theorem 7. The number of centers range from n = 5 to n = 26 based on the desired fill distance.The error sup x∈M |(U f g)(x) − (U n f g)(x)| is bounded by a function of the fill distance.