Three different entropies, variational principle and the degree formula.

My inaugural blog post, hurray!! This post is based on a mandatory assignment in a course in complex dynamics held at the university of Oslo 2018. For sources we mainly the book of Walters [1] and handouts. Some of the more basic results will, due to time constraints, be left unproven, but can all be found in [1].

– Introduction –

The notion of entropy, first formulated in the context of thermodynamics, is a quantity meant to capture the complexity/chaoticness of the state of a system. It is maybe not surprising that such a broadly defined notion spread like aids throughout different subfields of applied mathematics and information theory. Here, three different definitions of entropy are introduced, two of which turn out to be equivalent when working on compact metric spaces, and all three are related, on compact metric spaces, by the so called Variational Principle.
Lastly, the theorem of Misiurewicz, Przytycki and Gromov, which relates the entropy of a holomorphic map on the $ n$-sphere to its degree was proven.

– The three entropies –

We give the definitions in chronological order.

– Measure theoretic entropy –

Let $ (X, \mathcal{B})$ be a measurable space, and $ T:X\to X$ a measurable map. The collection of all probability measures will be denoted $M(X)$, and the T-invariant probability measures, that is, measures $\mu \in M(X)$ for which $\mu(A) = \mu(T^{-1}A)$ for all measurable $A$, will be denoted $ M(X, T)$. A measurable partition is defined as a collection $ \xi \subset \mathcal{B}$ of disjoint measurable sets that cover $ X$. We define the join of two measurable partitions as
\[
\xi \vee \eta = \{ C\cap D ~ | ~ C \in \xi, ~ D\in \eta \}
\]

The entropy of a measurable partition is defined as the (generalized) sum
$$H_\mu(\xi) = -\sum_{C\in \xi} \mu(C)log(\mu(C))$$
with the conventions that $H_\mu(\xi) = \infty$ if the union of all sets of measure zero in $ \xi$ have non-zero measure, and we let $\mu(C)log(\mu(C)) = 0$ if $\mu(C) = 0$.

Now the entropy of T with respect to a T-invariant probability measure $ \mu$ and a partition $ \xi$ with $H_\mu(\xi) < \infty$ is defined as

$$h_{\mu}(T, \xi) = \inf_{n\in \mathbb{N}} \left\{ \frac{1}{n} H(\bigvee^{n-1}_{k=0}T^{-k}\xi ) \right\} = \lim_{n\to\infty} \frac{1}{n} H_\mu\left(\bigvee^{n-1}_{k=0} T^{-k}\xi\right).$$

It should be noted that some authors define a measurable partition as a collection of measurable sets whose intersection have measure zero and which cover the space almost everywhere. Personally, I prefer the definition of a measurable partition to be independent of choice of measure. It is also customary to remove sets of measure zero in stead (as an alternative to the convention $0log(0) = 0$ we adopt here), but this again requires a choice of measure, which is unfortunate.

The map sending an invariant probability measure $ \mu \mapsto h_\mu(T)$ is called the entropy map. It is affine, but in general not continuous.

Definition 1 (Measure Theoretic Entropy)
The measure theoretic entropy of a measurable map $T$ and a measure $\mu\in M(X,T)$ is defined as $$h_\mu(T) = \sup\{ h_\mu(T, \xi) ~|~ H_\mu(\xi) < \infty \}.$$

We also will need the following definition in the proof of the Variational principle below

Definition 2 (Conditional Entropy)
Let $ \xi = \{C_1,…., C_n \}, \nu = \{B_1,…., B_m \}$ be two measurable partitions of $ X$. We define the conditional entropy of $ \xi$ given $ \nu$ as
$$H_\mu(\xi | \nu) = – \sum_{i,j} \mu(C_j\cap B_i) \log\left(\frac{\mu(B_i\cap C_j)}{\mu(B_i)}\right).$$

 

– Topological Entropy –

Analogously, we may define a notion of entropy in the category of compact topological spaces, which we will call the topological entropy. It was first given in the article of Adler et. al. [2]. For this we need the following preliminary definitions (here $ X$ will denote a compact space).

For any open cover $ \mathcal{U}$ of $X$ the quantity $ N(\mathcal{U})$ is the minimal cardinality of any subcover of $\mathcal{U}$. We define the entropy of the cover $\mathcal{U}$ to be the number $$H_{top}(\mathcal{U}) = log(N(\mathcal{U}))$$

The join of two open covers $ \mathcal{U}, \mathcal{W}$ of $ X$ is defined as the open cover $$ \mathcal{U}\vee \mathcal{W} = \{ U\cap W ~ | ~ U\in \mathcal{U}, ~ W\in \mathcal{W} \} .$$

The topological entropy of a continuous map $ \phi: X\to X$, with respect to the open cover $ \mathcal{U}$ of $ X$ is defined as the number
$$h_{top}(\phi, \mathcal{U}) = \lim_{n\to \infty}\frac{1}{n} H_{top}\left(\bigvee^{n-1}_{i=0} \phi^{-1}\mathcal{U}\right). $$

Definition 3 (Topological Entropy)
The entropy of a continuous map $\phi: X\to X$ on a topological space $X$ is defined as the supremum
$$h_{top}(\phi) = \sup \{h_{top} (\phi, \mathcal{U}) ~| ~ \mathcal{U} \text{ is an open cover of }X \}. $$

Though we will not delve into this, it should be mentioned that there are more definitions of topological entropy in the literature for general topological spaces which all reduce to the above definition if the space is compact.

– Metric Entropy –

 

Lastly we have an entropy which (to the best of my knowledge) was first introduced in [3] by Bowen. It makes explicit use of a metric, hence the name. Some authors call it the topological entropy, since, as we will see, they are actually equivalent on compact metric spaces.
Let $(X, d)$ be a compact metric space, and $T: X\to X$ a continuous map. To define the metric entropy, we need the following preliminary definitions

We define the metric $ d_T^n $ on $X$ by
$$d_T^n(x, y )= \max_{0\leq i \leq n-1} d(T^ix, T^iy).$$
The metrics $ d$ and $ d_T^n$ are  (strongly) equivalent Why?

$ B_r^d(x) \subset B_r^{d_T^n}(x)$ since $d \leq d_T^n$, and since $ X$ is compact, the $ T^i$’s are uniformly continuous, hence there exists a $ \delta$ such that $ B_\delta^{d_T^n}(x) \subset B_{r}(x)$.

A collection $ F \subset X$ of points is said to be $(n, \epsilon)$-separated if $$d^n_T(x,y) \geq \epsilon \qquad \text{for all }x,y \in F, \text{ with } x\neq y. $$ By compactness this set must be finite.
We let $N_d(T, \epsilon, n)$ denote the maximum carnality all $ (n, \epsilon)$-separated subsets of $ X$. This number is necessarily finite
Why?

If $ U$ is a cover of $ X$ by balls of $ d^n_T$-radius less than $ \epsilon/2$, then the cardinality of any $ (n, \epsilon)$-separated set, must be smaller than the number of elements in the cover $ U$ by a triangle inequality. This relies on the fact that $d_T^n$ and $ d$ are equivalent.

Define the quantity $h_d(T, \epsilon)$ to be
$$h_d(T,\epsilon) = \limsup_{n\to \infty} \frac{1}{n} \log(N_d(T, \epsilon, n)).$$

Definition 4 (Metric Entropy)
The metric entropy is defined as the limit
$$h_d(T) = \lim_{\epsilon\to 0} h_d(T,\epsilon).$$

$ h_d(T)$ is is well defined by the following lemma

Lemma 1 If $ \epsilon_1 < \epsilon_2$ then $ N_d(T, \epsilon_2, n) \leq N_d(T, \epsilon_1, n) < \infty$, hence $ h_d(T, \epsilon)$ is monotone non-increasing as $ \epsilon \to 0$ and bounded below by $ 0$ (by inspection).

Proof
First we show finiteness: Since all $ T^i $’s are uniformly continuous for all $i\in \mathbb{N}$, there is, for any $\epsilon >0$, a $\delta >0 $ such that $ d(T^ix, T^iy) < \epsilon $ whenever $ d(x, y ) < ~ \delta$, for all $ 0 \leq i \leq n-1$. It follows that $ N_d(T, \epsilon, n)$ is less than the number of $\delta$ discs needed to cover $ X$ (finite since $X$ is compact). For the monotonicity, assume $ \epsilon_1 <~ \epsilon_2$. If $F\subset X$ is $ (\epsilon_2, n)$-separated, then a forteriori it must by $ (\epsilon_1, n)$-separated, which implies that $ N_d(T, \epsilon_2, n) \leq N_d(T, \epsilon_1, n)$ .

– Alternative Defintions –

We can also treat the topological and measure theoretic entropies as limits over appropriate directed index sets. Let’s see how this plays out in the case of topological entropy, bearing in mind that the same could just as well be done with the measure theoretic entropy.

With $X$ a compact Hausdorff space, the family $$O = \{ \mathcal{U} ~ | ~ \mathcal{U} \text{ is an open cover of } X \}$$ can be made into a directed set with respect to the order relation of refinement, that is $\mathcal{U}_1 \prec \mathcal{U}_2$ if $\mathcal{U}_2$ is a refinement of $\mathcal{U}_1$. It is not hard to check $\prec$ is a partial ordering on the collection of open covers, and for any open cover $\mathcal{U}_1$ and $ \mathcal{U}_2$ we have $$ \mathcal{U}_1, \mathcal{U}_2 \prec \mathcal{U}_1\vee \mathcal{U}_2.$$

The same can be done for the measure theoretic entropy with the order relation $\xi\prec \eta$ if for all $U\in \eta $ there is some $V\in \xi$ such that $U\subset V$. Again $\xi, \eta \prec \xi\vee \eta$ so in both these cases, $\prec$ determines a directed set.

Now we can define the topological entropy and measure theoretic entropy of a map $\phi$ by

Definition 5 (Topological/Measure theoretic entropy [nets])
The etropies defined above admit a concise definition through nets, given by\begin{align*}
& h_{top}(\phi) = \lim_{\mathcal{U}}h_{top}(\phi, \mathcal{U}) \qquad \text{over all open covers $\mathcal{U}$} \\
& h_\mu(\phi)~ = \lim_{\xi}h_\mu(\phi, \xi) \qquad \text{over all measurable partitions $\xi$}
\end{align*}
Proof
If $\mathcal{U} \prec \mathcal{V}$, we know by Proposition 6 (2) that $H_{top}(\mathcal{U}) \leq H_{top}(\mathcal{V})$ and hence
$$h_{top}(T, \mathcal{U}) \leq h_{top}(T, \mathcal{V}),$$ so $\{h_{ top }( T, \mathcal{U} )\}_{\mathcal{U}}$ is a monotone net and hence must converge in $\mathbb{R}\cup \{ \infty \}$ to it’s supremum.

The situation for $h_\mu$ is completely analogous, using Proposition 3 (3) in place of Proposition 6 (2).

Note that for compact metric spaces, we know that the above defined net of open covers has a cofinal subsequence, that is, a subnet   which is also a sequence, and since all subnets of a convergent net converge to the same limit, we may as well take the limit over this sequence.

To construct the sequence we need the Lebesgue number lemma, which asserts that for any open cover $\mathcal{W}$ of a compact metric space there exists a number $\delta >0 $ such that any disc of radius $\delta$ is completely contained in some element in the cover. Let $\mathcal{U}_n = \{ B_{\frac{1}{n}}(x) ~ | ~ x \in X\}$ be the open cover of all balls of radius $\frac{1}{n}$. Now it follows that for any open cover $\mathcal{W}$ of $X$ with a Lebesgue number $\delta$, we have $\mathcal{W} \prec \mathcal{U}_n$ for all $n\in \mathbb{N}$ such that $\frac{1}{n} \leq \delta$, hence the sequence is cofinal.

– Basic Properties (Measure Theoretic Entropy)-

Here we list some of the basic properties of the entropy. First and foremost let’s check that the definition given above makes sense.

Proposition 1 (Existence of the Limit) The limit
$$h_{\mu}(T, \xi) = \lim_{n\to \infty} \frac{1}{n} H_\mu(\bigvee^{n-1}_{k=0} T^{-k}\xi)$$
exists, is finite, and is equal to $\inf_{n\in \mathbb{N}} \{ \frac{1}{n} H(\bigvee^{n-1}_{k=0}T^{-k}\xi ) \}.$
Proof

With $H_m = H_\mu\left(\bigvee^{m-1}_{k=0}T^{-k}\xi \right)$ we will show that the sequence $\{ H_n/n\}$ is decreasing as $n$ increases. This suffices to prove the theorem since the sequence is bounded below by 0. Noting that $H_{m+n} \leq H_m + H_n$, we may, for some fixed $p$, partition the index set into steps of $p$, that is, we may write $n = i+ kp$ where $0\leq i< p $. Now
\begin{align*}
\frac{H_n}{n} & = \frac{H_{i + kp}}{i + kp} \leq \frac{H_{i}}{i+kp} + \frac{H_{kp}}{i+kp} \leq \frac{H_{i}}{kp} + \frac{H_{kp}}{kp}\\
&\leq \frac{H_{i}}{kp} + \frac{k H_{p}}{kp} = \frac{H_{i}}{kp} + \frac{H_{p}}{p}.
\end{align*}
The result now follows by noting that as $n\to \infty$, $k\to \infty$.

Proposition 2

  • $h_\mu(id) = 0$
  •  $h_\mu(T^k) = kh_\mu(T)$ for every $k\in \mathbb{N}$
Proof
    • By definition we know that $H_\mu(\xi\vee\xi) = \sum_{(C, D) \in \xi \times \xi} \mu(C\cap D) \log(\mu(C\cap D))$ but note these summands are zero off of the diagonal of $\xi\times \xi$, hence we get that
      $$H_\mu(\xi \vee \xi) = H_\mu(\xi)$$
      Consequently
      $$h_\mu(id, \xi) = \lim_{n\to \infty} \frac{1}{n} H_\mu(\bigvee_{k=0}^{n-1}id^{-1}\xi) = \lim_{n\to \infty} \frac{1}{n} H_\mu(\bigvee_{k=0}^{n-1}\xi) = \lim_{n\to \infty} \frac{1}{n} H_\mu(\xi) = 0 $$ and the claim follows.
    • Though it is not generally true that $H(\bigvee^{n-1}_{k=0}T^{rk}\xi)$ equals $H(\bigvee^{r(n-1)}_{k=0}T^{k}\xi)$, this will not mater in the limit case. Let $\xi$ be a measurable partition of $X$ (of finite entropy), then
      \begin{align*}
      h_\mu(T^k, \bigvee^{k-1}_{i=1}T^i \xi) &= \lim_{n\to \infty} \frac{H(\bigvee^{nk-1}_{i=1} T^{-i} \xi)}{n}\\
      & = k \lim_{n\to \infty} \frac{H(\bigvee^{nk-1}_{i=1} T^{-i} \xi)}{kn}\\
      & = k h_\mu(T, \xi) \leq k h_\mu(T).\end{align*}
      Taking supremums over all partitions, we get$h_\mu(T^k) = \text{sup}_{\xi}\{ h_\mu(T^k, \xi) \} \geq \text{sup}_{\xi}\{ h_\mu(T^k, \bigvee^{k-1}_{i=1}T^i \xi ) \} = \text{sup}_{\xi}\{ k h_\mu(T, \xi)\} = h_\mu(T).$
      The reverse inequality follows directly from the inequality
      $$H_\mu(\bigvee^{k-1}_{i=1}T^i \xi) \geq H_\mu(\xi).$$

Other noteworthy properties of the measure theoretic entropy are listed in chapter 4.5 in [1]. We state them here without proof. Let $\xi, \eta$ be finite entropy measurable partitions of $X$.

Proposition 3

  1.  $h_\mu(T, \xi) \leq H_\mu(\xi)$
  2.  $h_\mu(T, \xi \vee \eta) \leq h_\mu(T, \xi) + h_\mu(T, \eta) $
  3.  $\xi \prec \eta \qquad \Rightarrow h_\mu(T, \xi) \leq h_\mu(T,\eta)$
  4.  if $k \geq 1$, then $h_\mu(T, \xi ) = h_\mu(T, \bigvee^{k-1}_{i=0}T^{-i} \xi)$
  5.  (continuity) $| h_\mu(T, \xi) – h_\mu(T, \eta)| \leq d(\xi, \eta)$, where $d$ is the metric on the space of finite partitions defined by
    $$d(\xi, \eta) = H_\mu(\xi | \eta) + H_\mu(\eta | \xi).$$
  6.  $h_\mu(T, \xi) \leq h_\mu(T, \eta) + H_\mu(\xi | \eta)$
  7.  $H_\mu(\xi) \leq log(|\xi|)$ if $\xi$ is finite, where $|\xi|$ denotes the number of elements in $\xi$.

The following theorem is used repeatedly, is easy to state, and has an elegant one-line proof, so there is no good reason not to include it here.

Theorem 1 (Krylov-Bogoliubov)
Let $X$ a compact metric space, $T: X\to X$ a continuous map, and $M(X, T)$ the space T-invariant probability measures on $X$. The space $M(X, T)$ is non-empty.
Proof
The map $\mu \mapsto T_\star \mu$ (the pushforward measure) is continuous with respect to the weak*-topology on $M(X, \mathcal{B})$, (which is compact by Alaoglu), hence by Schauder’s fixed point theorem, we know that it has a fixed point.
Proposition 4
The entropy map is an affine map, ie. it maps convex combinations to convex combinations.
Proof
Here we present a sketch of the proof, which can be found i [1].

It is not too hard to check that by concavity of the function $x\mapsto xlog(x)$ we get the inequality
$$ 0 \leq H_{p\mu + (1 – p )m}(\xi) – pH_\mu(\xi) – (1-p)H_m(\xi).$$
By log-sum arithmetics, one can also produce the inequality
$$ H_{p\mu + (1 – p )m}(\xi) – pH_\mu(\xi) – (1-p)H_m(\xi) \leq log(2)$$
From this we get that

\begin{align*}
\frac{H_{p\mu + (1 – p )m}(\bigvee_{i=0}^{n-1} T^{-1} \xi)}{n} & – \frac{ pH_\mu(\bigvee_{i=0}^{n-1} T^{-1} \xi)}{n} & \\ & – \frac{(1-p)H_m(\bigvee_{i=0}^{n-1} T^{-1} \xi)}{n} \leq \frac{log(2)}{n}
\end{align*}
which, as $n\to 0$ yields
$$h_{p\mu + (1 – p )m}(T, \xi) = ph_\mu(T, \xi)+ (1-p)h_m(T, \xi)$$

Deviating a bit from Walter’s proof, we may take the limit over the net of all measurable partitions (of finite entropy) in the above equality, which produces the equality
$$h_{p\mu + (1 – p )m}(T) = ph_\mu(T)+ (1-p)h_m(T).$$

As the next proposition shows, the above proposition can be extended to arbitrary convex combinations of ergodic probability measures, which are defined as measures $\mu \in M(T,X)$ for which $T^{-1}A = A \Rightarrow \mu(A) \in \{ 0, 1 \}$. It relies on the ergodic decomposition theorem  which I hope to cover in a subsequent post.

Theorem 2 (Jacobs)
If $T: X\to X$ is a continuous map on a compact metrizable space and $\mu \in M(T,X)$ has ergodic decomposition
$$\mu = \int_{EM(T, X)} \nu d\eta(\nu)$$
where $EM(T, X) = \{ \text{ all ergodic probability measures on } X \}$, then
$$h_\mu(T) = \int_{EM(T, X)} h_\nu(T) d\eta(\nu).$$

Consult Theorem 4.11 of [1] for the proof of the following important result,

Proposition 5
Entropy is conjugacy invariant (hence an isomorphism invariant in the category of measure spaces).

 

– Basic Properties (Topological Entropy )-

In this section $X, Y$ are compact metrizable space, the maps in question will always be continuous and $\mathcal{U}$ will denote an open cover of $X$. First we collect some basic results, whose proofs are all more or less straightforward. Recall that the order relation of refinement, which we denoted $\prec$ is used to determine a partial order on the collection of open covers.

Proposition 6

  1. $\mathcal{U} \prec \mathcal{U}’$ and $\mathcal{V} \prec \mathcal{V}’$, then $\mathcal{U}\vee \mathcal{V} \prec \mathcal{U}’ \vee \mathcal{V}’$.
  2. $\mathcal{U} \prec \mathcal{U}’$, then $N(\mathcal{U})\leq N(\mathcal{U}’)$ and so $H_{top}(\mathcal{U}) \leq H_{top}(\mathcal{U}’)$
  3. $N(\mathcal{U} \vee \mathcal{U}’) \leq N(\mathcal{U}) N(\mathcal{U}’)$ and so $H_{top}(\mathcal{U} \vee \mathcal{U}’) \leq H_{top}(\mathcal{U}) + H_{top}(\mathcal{U}’) $
  4. For any continuous $F: X\to X$, we have $N(F^{-1}\mathcal{U}) \leq N(\mathcal{U})$, and so $H_{top}(F^{-1}\mathcal{U}) \leq H_{top}(\mathcal{U})$.
  5. For an homeomorphism $F: X\to X$ we have $N(F^{-1}\mathcal{U}) = N(\mathcal{U})$, and so $H_{top}(F^{-1}\mathcal{U}) = H_{top}(\mathcal{U})$
Proof
  1. Let $C’ \subset \mathcal{U}’\vee \mathcal{V}’$, then $C’ = U’ \cap V’$ for some $U’\in \mathcal{U}’$ and $V’\in \mathcal{V}’,$ hence there are $U\in \mathcal{U}$ and $V \in \mathcal{U}$ such that $U’\subset U$ and $V’ \subset V$, So $C’ \subset C = U\cap V \in \mathcal{U} \vee \mathcal{V}$. The claim follows.
  2. If $\mathcal{U} \prec \mathcal{V}$, that is $\mathcal{V}$ refines $\mathcal{U}$, then we can for any subcover $\{V_1, …, V_{N(\mathcal{V})} \}\subset \mathcal{V}$ construct a subcover of $\mathcal{U}$ with (at most) the same cardinality, $\{ U_1, …, U_{N(\mathcal{V})} \} \subset \mathcal{W}$, where $V_i \subset U_i$. It thus follows that $$N(\mathcal{U}) \leq N(\mathcal{V}).$$ Taking log on both sides shows $H_{top}(\mathcal{U}) \leq H_{top}(\mathcal{U})$
  3. If $\mathcal{U}’ \subset \mathcal{U}$ and $\mathcal{V}’\subset \mathcal{V}$ are minimal covers of $X$, then $\{ U\cap V | ~ U\in \mathcal{U}’, ~ V \in \mathcal{V}’ \}$ is an open cover in $\mathcal{U}’ \vee \mathcal{V}’$ with (at most) $card(\mathcal{U}’) card(\mathcal{V}’)$ elements. The claim follows.
  4. Since $\mathcal{U}$ is a refinement of $F^{-1}(\mathcal{U})$, this follows from (2)
  5. This follows by substituting $F$ with $F^{-1}$ in (4).

The first thing we should check is that the definition of the topological entropy given above is well defined,

Proposition 7 (Existence of a Limit)
If $\mathcal{U}$ is any open cover of $X$, the limit
$$h_{top}(\phi, \mathcal{U}) = \lim_{n\to \infty}\frac{1}{n} H_{top}(\bigvee^{n-1}_{i=0} \phi^{-1}\mathcal{U})$$
exists and is finite.
Proof
\begin{align*}
H_{top}(\bigvee^{n + m -1}_{i=0} \phi^{-i}\mathcal{U})
& \leq H_{top}(\bigvee^{ m -1}_{i=0} \phi^{-i}\mathcal{U} ) + H_{top}(\phi^{-m}(\bigvee^{n -1}_{i=0} \phi^{-i}\mathcal{U})) \\
& \leq H_{top}(\bigvee^{ m -1}_{i=0} \phi^{-i}\mathcal{U} ) + H_{top}(\bigvee^{n – 1}_{i=0} \phi^{-i}\mathcal{U})
\end{align*}
where the first and second inequalitites follow from Proposition 6 (3) and (4) respectively.
Writing $H_m$ for $H_{top}(\mathcal{U} \vee … \vee \phi^{-m +1}\mathcal{U})$ we see that
$$H_{m+n} \leq H_m + H_n$$ so we may repeat the proof of Proposition 1.

We have the following collection of basic properties relating to the topological entropy.

Proposition 8

  1. Entropy is a topological invariant.
  2. $h(\phi^k) = kh(\phi)$ for all $k\in \mathbb{N}$.
Proof
  1. Assume $\psi: X\to Y$ is a homeomorphism of compact spaces, $\phi:X\to X$ is continuous, and $\mathcal{O}$ an open cover of $X$, then
    \begin{align*}
    h_{top}(\psi\phi\psi^{-1}, \psi \mathcal{O}) & = \lim_{n\to \infty} H_{top}(\psi\mathcal{O} \vee … \vee \psi\phi^{-n+1}\psi^{-1}\psi\mathcal{O})\\
    &= \lim_{n\to \infty} H_{top}(\psi( \mathcal{O} \vee \phi^{-1}\mathcal{O} \vee… \vee \phi^{-n+1}\mathcal{O} ) ) \\
    & = \lim_{n\to \infty} H_{top}(\mathcal{O} \vee \phi^{-1}\mathcal{O} \vee… \vee \phi^{-n+1}\mathcal{O} ) ~ \text{ (Prop. 6.(5)) }\\
    & = h_{top}(\phi, \mathcal{O})
    \end{align*}
    since $\psi$ is a homeomorphism it induces an isomorphism of measure algebras, so taking supremums over all cover covers $\mathcal{O}$ concludes the proof.
  2. we have
    \begin{align*}
    h_{top}(\phi^k) & \geq h_{top}(\phi^k, \mathcal{O} \vee \phi^{-1}\mathcal{O} \vee… \vee \phi^{-k+1}\mathcal{O})\\
    &= k \lim_{n\to \infty} \frac{1}{nk} H_{top}(\mathcal{O} \vee … \vee \phi^{-nk+1}\mathcal{O})\\
    &= k h_{top}(\phi, \mathcal{O})
    \end{align*}
    so we get $h_{top}(\phi^k)\geq kh_{top}(\phi)$. Conversely, since $\bigvee^{n-1}_{i=1} \phi^{-ki } \mathcal{U} \prec \bigvee^{kn -1}_{i=1} \phi^{-i} \mathcal{U}$ Proposition 6 (2) yields
    \begin{align*}
    h_{top}(\phi) & = \lim_{n\to \infty} \frac{H_{top}(\bigvee^{kn -1}_{i=1} \phi^{-i} \mathcal{U})}{kn} \\
    & \geq \lim_{n\to \infty} \frac{H_{top }(\bigvee^{n-1}_{i=0} \phi^{-ki}\mathcal{U})}{nk} = \frac{h_{top}(\phi^k, \mathcal{U})}{k}.
    \end{align*}
    so the reverse inequality also holds.

– Basic Properties (Metric Entropy)-

In this section $(X,d)$ is a compact metric space.

Some properties of Metric entropy

  • If $T$ is an isometry, then $h_d(T) = 0$
  • If two metrics $d$ and $d’$ on $X$ are equivalent, then $$h_d(T) = h_{d’}(T).$$
  • If $(X, \phi)$ is a compact dynamical system, such that $\{\phi^n\}$ is equicontinuous, $h_{top}(\phi) = 0 $

Two metrics $d$ and $d’$ are said to be equivalent (denoted $d \sim d’$) if they induce the same topology. On a compact metric space $X$, this is equivalent to the existence of two fixed real numbers $\alpha, \beta > 0$ such that $\alpha d(x, y) \leq d'(x,y) \leq \beta d(x,y)$ for all $x, y \in X$. On a general metric space, $\alpha $ and $\beta$ are allowed to depend $x$ or $y$.

Proof
  • If $T$ is an isometry, then $d = d_n^T$, so $N_d(T, n, \epsilon)$ is constant as $n$ varies, and has been shown previously to be finite, hence $\limsup_{n\to \infty} \frac{1}{n} N_d(T, n, \epsilon) = 0$.
  • With $h_d(T,\epsilon) = \limsup_{n\to \infty} \frac{1}{n} \log(N_d(T, n, \epsilon))$, we noted in the definition that $h_d(T, \epsilon)$ is monotone non-increasing in $\epsilon$. If $d\sim d’$, then there are positive real numbers $ 0 < m \leq M < \infty$ , such that
    $$m d(x, y) \leq d'(x,y ) \leq M d(x, y) \qquad \text{for all } x, y \in X.$$
    This immediately yields
    $m d_T^n(x, y) \leq d_T^{n’}(x,y) \leq M d_T^n(x, y)$, so $N_d(T, m\epsilon, n) \leq N_{d’}(T, n, \epsilon) \leq N_d (T, M \epsilon, n)$
    and
    $$h_{d’}(T, m\epsilon) \leq h_{d}(T, \epsilon) \leq h_{d’}(T, M\epsilon).$$
    Letting $\epsilon \to 0$, since $h_{d’}$ converges, we see that the we get equality, and conclude that $h_d(T) = h_{d’}(T)$.
  • If $\{ f^n\}$ is equicontinuous, then there exists a $\delta > 0$ such that $d^n_f(x, y) < \epsilon $ for all $n$, if $d(x,y) < \delta$. It follows that for a fixed $\epsilon$ and any $n$, $N_d(f, \epsilon, n)$ is less than the number of $\frac{\delta}{2}$ balls needed to cover $X$. Hence $\limsup_{n\to \infty} \frac{log(N_d(f, \epsilon, n))}{n} \to 0$ for all $\epsilon$.

– Relations between the definitions –

In the literature the topological and metric entropies are used interchangeably, the reason for which is the following theorem.

Theorem 3
If $X$ is a compact metric space, and $f: X\to X$ is any continuous map, then
$$h_{top}(f) = h_d(f)$$
Proof
We will show $ N_d(T, n, \epsilon) \leq N(\mathcal{U}^n) \leq N_d(T, n, \frac{\delta}{2}) $, where $\delta$ is a Lebesgue number of the open cover $\mathcal{U}$, $\mathcal{U}^n := \bigvee^{n-1}_{i=0}T^{-i}\mathcal{U} $, and $\epsilon$ is the supremum of the diameters of $\mathcal{U}$ which we assume, without loss of generality, to be finite.

Step 1: First let’s produce the following inequality
$$N_d(T, n, \epsilon) \leq N(\mathcal{U}^n).$$
Letting $V\in \mathcal{U}^n$, that is $V = C_0 \cap T^{-1}C_1 \cap … \cap T^{-n+1}C_{n-1}$ for some $C_i \in \mathcal{U}$. Note that we have $ diam(T^{i}V) < \epsilon$ since

\begin{align*} $diam(T^iV) & = diam(T^{i}C_0 \cap… \cap C_i\cap ….\cap T^{-n + i+1}C_{n-1})\\ &  \leq diam(C_i) < \epsilon.
\end{align*}

It follows that if $x,y \in V$, then
\begin{align*}
& d(x,y)< \epsilon \tag{since $x,y \in C_0$ }\\
& d(Tx, Ty) < \epsilon \tag{since $Tx, Ty \in C_1$}\\
& …
\end{align*}
hence $d_T^n(x,y) = \max_{i=0, …, n-1} d(T^ix, T^iy) < \epsilon$. So if $F$ is any $(n, \epsilon)$-separated it can contain at most one point in each element of a subcover $\mathcal{U}^n$. This proves the inequality and consequently we get that
\begin{align*}
h_{top}(T, \mathcal{U}) & = \lim_{n\to \infty} \frac{1}{n} log(N(\mathcal{U}^n)) \geq \limsup_{n\to \infty} \frac{1}{n} log(N_d(T, n, \epsilon)) \\ & = h_d(T,\epsilon)
\end{align*}
Step 2 : If $\delta$ is a Lebesgue number of $\mathcal{U}$, we will now prove that
$$ N(\mathcal{U}^n) \leq N_d(T, n, \frac{\delta}{2}).$$

First note that if $\delta$ is a Lebesgue number for $\mathcal{U}$ with respect to the metric $d$, then $\delta$ is the Lebesgue number for $\mathcal{U}^n$ with respect to the metric $d_T^n$. To see this, let $B_{\delta}^n(z)$ be a $\delta$-disc in the metric $d_T^n$ centred at some arbitrary $z\in X$. If $x, y \in B_{\delta}^n(z)$ then $d_T^n(x,y) < \delta$ from which we get
\begin{align*}
& d(x,y) < \delta \\
& d(Tx, Ty )< \delta \\
& d(T^2x, T^2y )< \delta \\
& ….
\end{align*}
So, for any $i$ we have that $\{ T^ix ~ |~ x \in B_{\delta}^n(z) \} $ is contained in some ball of radius $\delta$, hence also contained in some $C_i \in \mathcal{U}$ (since $\delta$ is the Lebesgue number of $\mathcal{U}$). Define $V\in \mathcal{U}^n$ by $V = C_0 \cap T^{-1}C_1 \cap… \cap T^{-n + 1} C_{n-1}$. It’s now easy to check that $B_{\delta}^n(x) \subset V$.

We can construct a cover $\mathcal{B}$ of $\delta$-neighbourhoods (in the metric $d_T^n$) inductively such that each ball is centered around points which are separated by a distance greater than $\frac{\delta}{2}$, and the have radius (in the metric $d_T^n$) less than $\delta$. This is a refinement of $\mathcal{U}^n$ and hence $N(\mathcal{U}^n) \leq N(\mathcal{B})$ and the center points form a collection of $(n, \frac{\delta}{2})$-separated points. The claim now follows, and we get the inequality

\begin{align*} h_{top}(T, \mathcal{U}) & = \lim_{n\to \infty} \frac{1}{n}log(N(\mathcal{U}^n)) \leq \limsup_{n} \frac{1}{n}log(N_d(T, n, \frac{\delta}{2})) \\ & = h_d(T, \frac{\delta}{2} )
\end{align*}

Letting $\mathcal{B}_{n} = \{\text{all balls of radius }\leq \frac{1}{n} \}$ be the (cofinal) sequence of open covers of $X$ defined earlier we have seen that $h_{top} = \lim_{n\to \infty} h_{top}(T, \mathcal{B}_n)$. But as $n\to \infty$ both $\epsilon $ and $\delta$ go to zero, hence the above inequalities become and equalities and we get
$$h_d(T) = h_{top}(T).$$

The variational principle relates the topological/metric entropy on a compact metric space with the measure theoretic entropy with respect to regular Borel measures. Explicitly we have

Theorem 4 (Variational Principle)
If $X$ is a compact metric space, and $f: X\to X$ is any continuous map, then
$$h_{top}(f) = sup\{ h_\mu(f) ~ |~ \mu \text{ is a regular Borel measure on } X \}$$
Proof
Pick an arbitrary (Borel) measurable partition $\xi = \{A_1, …, A_k \}$ of $X.$ By regularity we may pick $B_i \subset A_i$ such that $\mu(A_i \backslash B_i) < \epsilon = \frac{1}{k\log(k)}$, and let $\nu = \{B_0, B_1, …, B_k \}$ be a measurable partition, with $B_0 = X\backslash \bigcup^k_{i=1} B_i$. The conditional entropy $H_\mu(\xi | \nu) $ is bounded above by one, since (with $\phi(x) = x\log(x)$)

\begin{align*} H_\mu(\xi|\nu) & = -\sum_{j= 0}^k\sum_{i= 1}^k \mu(B_j) \phi(\frac{\mu(B_j\cap A_i)}{\mu(B_j)}) \\ & = -\mu(B_0) \sum_{i= 1}^k\phi(\frac{\mu(B_0\cap A_i)}{\mu(B_0)}) \\ & \underbrace{-\mu(B_1) \sum_{i= 1}^k\phi(\frac{\mu(B_1\cap A_i)}{\mu(B_1)}) – … -\mu(B_k) \sum_{i= 1}^k\phi(\frac{\mu(B_k\cap A_i)}{\mu(B_k)})}_{= 0} \\ & = -\mu(B_0) \sum_{i= 1}^k\phi(\frac{\mu(B_0\cap A_i)}{\mu(B_0)}) \end{align*} This follows, since, with $j > 0$, we get $\frac{\mu(B_j \cap A_i)}{\mu(B_j)} = \begin{cases} 0 & i\neq j \\ 1 & i=j \end{cases}$, either way $\phi(1) = 0$ and (by convention) $\phi(0) = 0$.

By Proposition 3 (7) we have that
$$H_\mu(\xi) \leq log(k).$$
Inserting this into the above equation, we get the inequality
\begin{align*}
H_\mu(\xi|\nu) & \leq \mu(B_0)log(k).
\end{align*}
Now $\mu(B_0) = \mu(X\backslash \bigcup_{j=1}^kB_j) = \mu(\bigcup_{j=1}^k (A_i \backslash B_i)) = k \epsilon$, hence
\begin{align*}
H_\mu(\xi|\nu) & \leq \mu(B_0)log(k) < k\epsilon \log(k) < 1. \end{align*} From Proposition 3 (6) we get $$ h_\mu(T, \xi) \leq h_\mu(T, \eta) + H_\mu(\xi | \eta) < h_\mu(T, \eta) + 1$$ Now define the partition $\beta = \{B_0\cup B_1, ..., B_0\cup B_k \}$. Note that $\beta$ is also an open cover of $X$! For each $a \in \bigvee_{i=1}^{n-1}T^{-i}\beta$ there are at most $2^n$ distinct elements from $\bigvee_{i=1}^{n-1}T^{-i} \nu$ contained in $a$. To see this, let $a = (B_0 \cup B_{j_0}) \cap T^{-1}(B_0 \cup B_{j_1}) ... \cap T^{-n+1}(B_0 \cup B_{j_{n-1}})$, and assume $b = B_{i_0}\cap T^{-1}(B_{i_1}) \cap ... \cap T^{-n +1}(B_{i_{n-1}}) \in \bigvee_{i=1}^{n-1}T^{-i} \nu $ is contained in $a$. It follows that \begin{align*} & B_{i_0} \subset B_0\cup B_{j_0} & \Rightarrow ~ B_{i_0} = \begin{cases} B_0 \\ B_{j_0} \end{cases}\\ & T^{-1}(B_{i_1}) \subset T^{-1}(B_0\cup B_{j_1}) & \Rightarrow ~ B_{i_1} = \begin{cases} B_0 \\ B_{j_1} \end{cases}\\ & ...& \\ & T^{-n +1 }(B_{i_{n-1}}) \subset T^{-n+1}(B_0\cup B_{j_{n-1}}) & \Rightarrow ~ B_{i_{n-1}} = \begin{cases} B_0 \\ B_{j_{n-1}} \end{cases}\\ \end{align*} which gives a total of (at most) $2^n$ possible combinations. Hence $$|\bigvee_{i=1}^{n-1}T^{-i} \nu| \leq 2^n|\bigvee_{i=1}^{n-1}T^{-i}\beta|$$ where $|\cdot|$ denotes the cardinality of the collections. Since $\beta$ is a minimal cover, that is, it has no proper subcovers, one can verify that $\bigvee^{n-1}_{i=0} \beta$ is also minimal, hence we have $N(\bigvee^{n-1}_{i=0} \beta) = |\bigvee^{n-1}_{i=0} \beta |$. Inserting this into the definitions, we get \begin{align*} h_\mu(T, \nu) & < h_\mu(T, \eta) + 1 = \lim_{n\to \infty} \frac{H_\mu(\bigvee^{n-1}_{i=0}T^{-i}\eta)}{n} + 1 \\ & \leq \lim_{n\to \infty} \frac{log(2^nN(\bigvee_{i=0}^{n-1} T^{-i}\beta))}{n} + 1 = h_{top}(T, \beta) + \log(2) +1 \\ & \leq h_{top}(T) + \log(2) +1. \end{align*} Substituting $T$ with $T^n$ we get that $n h_\mu(T) < n h_{top}(T) + \log(2) + 1,$ which, since $n$ is arbitrary, shows that $$h_\mu(T) \leq h_{top}(T)$$ for all continuous maps $T$ on $X$. This concludes the first part of the proof. Now we will show that we can find a $\mu \in M(X, T)$, with $h_\mu(T)$ arbitrarily close to $h_d(T)$ (the metric entropy) which has been shown to be equal to $h_{top}(T)$. To this end, fix $\epsilon > 0 $, let $E_n^\epsilon \subset X$ be a $(n, \epsilon)$-separated set of maximal cardinality (that is $|E_n^\epsilon | = N_d(T, n, \epsilon)$) and define \begin{align*} & \sigma_n = \left( \frac{1}{|E_n^\epsilon|}\right) \sum_{x\in E_n^\epsilon} \delta_x \qquad \qquad \text{where $\delta_x$ is the Dirac measures} \\ & \mu_n = \frac{\sum_{i=0}^{n-1} \sigma_n\circ T^{-i}}{n}. \end{align*} By compactness of $M(X, T)$, we may find a subsequences, indexed by $n_i$ such that $h_d(T, \epsilon) = \lim_{i\to \infty} \frac{1}{n_i} log(N_d(T, n_i, \epsilon))$, and $\mu_{n_i}$ converges in the vague (or weak*)-topology on $M(X, T)$. By definition of vague convergence we see that since all $\mu_{n_i}$ are $T$ invariant, it follows that \begin{align*} \int_{X} f\circ d(\mu \circ T^{-1}) &= \int_{X} f\circ T d\mu = \int = \lim_{n_i \to \infty} \int_{X} f \circ T d\mu_{n_i} \\ & = \lim_{n_i \to \infty} \int_{X} f d\mu_{n_i} = \int_{X} f d \mu \end{align*} so $\mu \in M(X, T)$. We will show that $$\limsup_{n\to \infty}log(N_d(T, n, \epsilon)) \leq h_\mu(T)$$ which, since $\epsilon$ was arbitrary, shows that we may approximate $h_{top}(T)$ from below by $h_\mu(T)$ where $\mu \in M(X, T)$ is a regular Borel measures. For this we will need the following lemma

Lemma 1

  1. If $x\in X$ and $\delta > 0$ there exists a $\delta’ < \delta$ such that $\mu(\partial (B_{\delta’}(x))) = 0$ (that is, the measure of the boundary of the $\delta’$-ball centered at $x$ is zero)
  2. if $\delta > 0$ there is a finite partition $\xi = \{ A_1, …, A_k \}$ of $X$ such that $diam(A_j) < \delta$ and $\mu(\partial A_j) = 0 $ for each $j$.
  3. If all members of $\xi$ have boundary measure zero, then all members of $\bigvee^{n-1}_{i=0}T^{-i}\xi$ have boundary measure zero.

Proof

  1. $\mu(B{\delta}(x)) = \mu(\bigcup_{\delta’ < \delta}\partial B_{\delta’}(x)) \leq 1$ so uncountably many $\delta’$’s satisfy this property.
  2. We know from (1) there is an open cover $\{B_1, …, B_k \}$ of $X$ of balls of radius $<\frac{\delta}{2}$. Define the partition $A_1 = \overline{B_1}$ and $A_n = \overline{B_n} \backslash (\overline{B_1} \cup… \cup \overline{ B_{n-1}})$. One can check that $\{A_1, …, A_{k} \}$ does the job.
  3. Let $C= C_{i_1} \cap T^{-1}(C_{i_2})\cap…\cap T^{-n+1}(C_{i_{n-1}})$. Then we have
    $$\mu(\partial C ) = \mu(\partial (\bigcap_{j=0}^{n-1} T^{-j} C_{i_j})) \leq \mu(\bigcup_{j=1}^{n-1}T^{-j}\partial C_{i_j}) = 0.$$

Why do we care about sets with boundary measure zero? The reason is that if $\eta_n$ is a sequence of probability measures which converges to $\eta$ in the vague topology, and $B$ has $\eta_n(\partial B) = 0$ for all $n$, , then
$$ \eta(B) = \lim_{n\to \infty} \eta_n(B).$$
As a consequence for any partition $\xi$ consisting of sets with zero boundary measure, we have

\begin{equation}
\lim_{n\to \infty} H_{\mu_n}(\xi) = H_{\mu}(\xi)
\end{equation}

Employing this lemma, let $\xi = \{A_1, …, A_k \}$ be a measurable partition of $X$ such that $\mu(\partial A_i) = 0$ and $diam(A_i) < \epsilon$. We have
\begin{equation}
\label{eq:3}
H_{\sigma_n}(\bigvee_{i=0}^{n-1}T^{-i} \xi) = -\sum_{C\in \bigvee_{i=0}^{n-1}T^{-i} \xi} \sigma_n(C) log(\sigma_n(C)) = log(|E_n|)
\end{equation}
since each $C \in \bigvee_{i=0}^{n-1}T^{-i} \xi $ contains at most one $x \in supp(\sigma_n)$.

Here things get messy, but the idea is to split $\bigvee_{i=0}^{n-1}T^{-i} \xi$ into more manageable pieces. Fixing $p, n\in \mathbb{N}$, with $1 < q < n$ and let $\alpha : \{0, 1, …, q-1 \} \to \mathbb{N}$ be given by $\alpha(j) = \left[\frac{n-j}{q} \right]$ (i.e. the smallest integer greater than $\frac{n-j}{q}$). Now we decompose the set
$$\{0, 1, …, n \} = \{ j + rq -i ~| ~ 0 < r< \alpha(j), 0 < i \leq q \} \cup S$$
where $S = \{ 0, 1, …, j, j+a(j)q+1, …, n-1 \}$. By construction we have $|S| \leq 2q$. From this we deduce that
$$ \bigvee_{r=0}^{\alpha(j) -1}T^{-(rq + j)} \left( \bigvee_{i=0}^{q-1}T^{-i} \xi \right) \vee \left( \bigvee_{j\in S}T^{-j} \xi \right)$$
since, even though we have repeated some indices more than once on the right hand side, we exploit that for any measurable partition $\xi$ we have $\xi \vee \xi = \xi$, hence the repetition does not affect the join.
Now

\begin{align*}
log(|E_n|) & = H_{\sigma_n}(\bigvee_{i=0}^{n-1}T^{-i} \xi) \\
& = H_{\sigma_n}\left( \bigvee_{r=0}^{\alpha(j) -1}T^{-(rq + j)} \left( \bigvee_{i=0}^{q-1}T^{-i} \xi \right) \vee \left( \bigvee_{j\in S}T^{-j} \xi \right)\right) \\
& \leq \sum_{r= 0}^{\alpha(j)- 1} H_{\sigma_n}\left( \bigvee_{r=0}^{\alpha(j) -1}T^{-(rq + j)} \left( \bigvee_{i=0}^{q-1}T^{-i} \xi \right) \right) + \sum_{j\in S} H_{\sigma_n}(T^{-j}\xi) \\
& \leq \sum_{r=0}^{\alpha(j) – 1} H_{\sigma_n \circ T^{-(rq + j)}} \left( \bigvee^{q-1}_{i= 0}T^{-i}\xi \right) + 2q log(k) \\
\end{align*}
where the last two inequalities follow by Proposition 3 (2) and (7) respectively.
If we sum over all j’s in the interval $0\leq j\leq q-1$ we end up with

\begin{align*}
q log(|E_n|) & \leq \sum_{j= 0}^{n-1} \sum_{p=0}^{n-1} H_{\sigma_n \circ T^{-(rq + j)}}\left( \bigvee_{i=0}^{q-1} T^{-i} \xi \right) + 2q^2 log(k)\\
& \leq \sum_{p= 0}^{n-1} H_{\sigma_n \circ T^{-p}}\left( \bigvee_{i=0}^{q-1} T^{-i} \xi \right) + 2q^2 log(k)\\
\end{align*}

Divide both sides by $n_j$, and take the limit as $j\to \infty$ gives

\begin{align*}
q \limsup_{n\to \infty} \frac{log(N_d(T, n, \epsilon))}{n} & = q \lim_{j\to \infty}\frac{1}{n_j} \frac{log(N_d(T, n_j, \epsilon))}{n_j}\\
& \leq q\lim_{j\to \infty} \frac{1}{n_j}\sum_{p= 0}^{n_j-1} H_{\sigma_n \circ T^{-p}}\left( \bigvee_{i=0}^{q-1} T^{-i} \xi \right) \\ & + \frac{1}{n_j} 2q^2 log(k)\\
& = q\lim_{j\to \infty} H_{\mu_{n_j}}\left( \bigvee_{i=0}^{q-1} T^{-i} \xi \right) + \frac{1}{n_j} 2q^2 log(k)\\
& = H_{\mu}\left( \bigvee_{i=0}^{q-1} T^{-i} \xi \right)
\end{align*}
We used that
\begin{align*}
\frac{1}{n_j}\sum_{p=0}^{n_j-1}H_{\sigma_{n_j}\circ T^{-p}}\left( \bigvee_{i=0}^{q-1} T^{-i} \xi \right) & = H_{\sum_{p=0}^{n_j – 1} \frac{\sigma_{n_j}\circ T^{-p}}{n_j}}\left( \bigvee_{i=0}^{q-1} T^{-i} \xi \right) \\ & = H_{\mu_{n_j}} \left( \bigvee_{i=0}^{q-1} T^{-i} \xi \right)
\end{align*}

Now dividing by $q$ and taking limits as $q\to \infty$ on the right hand side we finally get the following inequality
$$ \limsup_{n\to \infty} \frac{log(N_d(T, n_j, \epsilon))}{n_j} \leq h_\mu(T, \xi) \leq h_\mu(T).$$

– The Degree formula –

This section is devoted the proof a theorem due to Misiurewicz, Przytycki and Gromov, which relates the topological entropy of a holomorphic map on the complex $n$-sphere to the degree of the map. Here is the first result which is a shameless copy of the proof given Theorem 8.3.1 in [4].

Theorem 5 (Misiurewicz-Przytycki)
Let $M$ be a smooth compact orientable manifold and $f: M \to M$ a continuously differentiable map, then $h_d(f) \geq log(deg(f))$.

Proof
We will, for any $\alpha \in (0, 1)$ and for a clever choice of $\delta$, construct an $(n, \delta)$-separated set of $X$ with cardinality greater or equal to $N^{\alpha n}$, where $N := deg(f)$. If we can produce such a set, we get that
\begin{align*}
h_{d}(T) & = \lim_{\delta \to 0} \limsup_{n\to \infty} \frac{log(N_d(T, n, \delta))}{n} \\ & \geq \lim_{\delta \to 0} \limsup_{n\to \infty} \frac{log( N^{\alpha n})}{n} = \alpha \log(N)
\end{align*}
and the conclusion follows, since $\alpha$ was arbitrary.

Let $L = \sup_{x\in M} |Jf(x)|$ (the (total) derivative of $f$), fix any $\alpha \in (0, 1)$ and define $\epsilon = L^{-\alpha/(\alpha -1)}$ (this will make sense shortly). Define the compact set
$$B= \{ x\in M ~ | ~ |Jf(x)| \geq \epsilon \}$$
and, using the inverse function theorem, cover $B$ by open sets where $f$ is injective. Let $\delta$ be a Lebesgue number of this covering. That is, in every $\delta$-disk of $B$ the function $f$ is injective. Define
$$A = \{ x\in M ~ |~ card(B\cap \{x, f(x), .., f^{n-1}(x) \}) \leq \alpha n \}$$
Note that by our choice of $\epsilon$ we have that $|Jf^n(x)| < 1$ for all $x\in A$, since
$$|Jf(x)| = \prod_{i=0}^{n-1} |Jf(f^i(x))| < \epsilon^{1-\alpha}n L^{\alpha n} = 1$$
hence $f^n$ is strictly contractive on $A$, so $Vol_\omega(M\backslash f^n(A)) < Vol_\omega(M)$, where $\omega$ is a normalized volume form. Since the critical values have measure zero (Sard’s theorem) there exists a regular value $x\in M\backslash f^n(A)$, that is values with preimages of cardinality $N$. For $i=1, .., n$ define the set

$$Q^x = \begin{cases} f^{-1}(\{x\}) & \text{if } f^{-1}(\{x\}) \subset B \\ \text{a single point of } f^{-1}(\{x \})\text{ outside of } B & \text{else.} \end{cases}$$
Now we construct the $(n, \delta)$-separated set by the following induction process
\begin{align*}
Q_1 &:= Q^x\\
Q_2 & = \bigcup_{y\in Q_1} Q^y\\
….\\
Q_n &= \bigcup_{y\in Q_{n-1}} Q^y
\end{align*}
We will show $Q_n$ is the set we are looking for.

$Q_n$ is $(\delta, n)$-separated: Let $y_1, y_2 \in Q_n$, and assume for contradiction that $d_f^n (y_1, y_2) = \max_{i = 0, 1, …, n-1} d(f^i(y_1), f^{i}(y_2)) < \delta$. Assume first that $f^{n-1}(y_1)$ and $ f^{n-1}(y_2)$ are distinct points in $Q_1$. By construction then we must have that $Q_1 \subset B$, but since $d( f^{n-1}(y_1), f^{n-1}(y_2)) < \delta$, and $f$ is injective on all $\delta$-discs in $B$, we must have $f(f^{n-1}(y_1)) \neq f(f^{n-1}(y_2))$, which is impossible since $f^{n-1}(y_1)$ and $f^{n-1}(y_2)$ are in the same fiber (namely $f^{-1}(\{x\})$). Hence $f^{n-1}(y_1) = f^{n-1}(y_2)$. Continuing the process inductively we eventually get $y_1 = y_2$.

To show $Q_n$ is large enough, first note that $Q_n \cap A = \emptyset$ since $Q_n \subset f^{-n}(M\backslash f^n(A)) \subset M\backslash A$, so we know that for any $y\in Q_n$ we must have
$$card(B \cap \{y, f(y), …, f^{n-1}(y) \}) \geq \alpha n.$$
There are hence at least $[\alpha n]$ numbers $0 \leq i \leq n-1$ such that $f^i(y) \subset B\cap Q_{n-i}$. For each such number there are, by construction, $N$ distinct elements in $Q_i$ in the same fiber as $f^{i}(y)$. Each of these numbers have at least one distinct preimage in $Q_n$. This shows that $card(Q_n) \geq N^{\alpha n}$ and concludes the proof.

Next we specialise to the case where the manifold is the complex n-sphere.

Theorem 6 (Gromov)
Let $f:\mathbb{P}^n \to \mathbb{P}^n$ be a holomorphic map, then $$h(f) \leq log(deg(f))$$
Proof
Let $$\Gamma_m = \{ (x, f(x), …, f^{m-1}(x)) \} \subset \underbrace{\mathbb{P}^n \times … \times \mathbb{P}^n}_{m} \} =: \prod^m \mathbb{P}^n.$$
Denote by $\tilde{x} = (x_0, …, x_{m-1}) \in \prod^m \mathbb{P}^n $, and let $d^m$ and $d_{+}^m$ be the metrics on $\Gamma_m \subset \mathbb{P}^n \times … \times \mathbb{P}^m$ defined by

\begin{align*}
d_{+}^m(\tilde{x}, \tilde{y}) &= \sum_{i=0}^{m-1} d(x_i, y_i)\\
d^m(\tilde{x}, \tilde{y}) & = \max_{i=0,.., m-1} \{ d(x_i, y_i)\}.
\end{align*}

These metrics induce the same topology, since $d^m \leq d_+^m \leq m d^m$. Note that the metric $d_f^m$ defined in section “metric entropy” coincides with the metric $d^m$, by this we mean that
$$d_f^n(x) = \max_{i=0, .., m-1}\{ d(f^i(x), f^i(y)) \} = d^m(\tilde{x}, \tilde{y})$$
where $\tilde{x} = (x, f(x), …, f^{m-1}(x))$ and $\tilde{y}=(y, f(y), …, f^{m-1}(y))$ are in $\Gamma_m$.
Next we will need a result from geometry, which says that for any $\tilde{x} = (x_0, x_1, …, x_{m-1}) \in \Gamma_m$ we have a lower bound on the volume of the disc $B^{m, +}_\epsilon(x) = \{ \tilde{y}=(y_0, .., y_{m-1}) \in \Gamma_m, ~|~ d^m_{+}(\tilde{y}, \tilde{x}) < \epsilon \}$ which is independent of both $x$ and $m$. That is $$Vol_\omega(B_\epsilon(x)) \geq C_\epsilon, $$ where $C_\epsilon$ depends on $\epsilon$ but not on $x$ or $m$, and $\omega$ is the (normalized) volume form associated with the Riemann metric $d_{+}^m$. Be warned that this result requires some sort of compatibility between the metric and the complex structure, which $d^m$ does not satisfy.

Note that, since $d_{+}^m \geq d^m$, we have $B^{m, +}_\epsilon(\tilde{x}) \subset B^{m}_\epsilon(\tilde{x}) := \{ \tilde{y} \in \Gamma_m ~|~ d^m(\tilde{x}, \tilde{y}) <\epsilon \}.$ Let $E_m^\epsilon$ be a maximal $(m, \epsilon)$-separated set in $\mathbb{P}^n$, then we know the collection $\{ B^{m}_{\epsilon/2}(\tilde{x})~ |~ x\in E_m^\epsilon, ~ \tilde{x} = (x, f(x), .., f^{m-1}(x)) \}$ do not intersect, and hence neither will $\{ B^{m,+}_{\epsilon/2}(\tilde{x})~ |~ x\in E_m^\epsilon, ~ \tilde{x} = (x, f(x), .., f^{m-1}(x)) \}$. So we have
$$Vol_\omega(\Gamma_m) \geq \sum_{x\in E_\epsilon^m} Vol_{\omega}(B^m_\epsilon(\tilde{x})) \geq \sum_{x\in E_\epsilon^m} Vol_{\omega}(B^{m, +}_\epsilon(\tilde{x})) \geq C_{\epsilon} N_d(f, m, \epsilon)$$

Now $Vol(\Gamma_m) = \sum^m \int_{\mathbb{P}^n}|Jf^i| \omega = \int_{\mathbb{P}^n}(f^i)^*\omega $ where $\omega$ is a volume form on $\mathbb{P}^n$. By inspection we have
\begin{equation}
\label{eq:2}
Vol(\Gamma_m) = \sum^{m-1}_{i=0} \int_{\mathbb{P}^n}|Jf^i| \omega = \sum^{m-1}_{i=0} N^i
\end{equation}
where the last inequality follows from algebraic black magic.
Inserting this into the entropy formula, we get
\begin{align*}
h_d(f) & = \lim_{\epsilon \to 0} \limsup_{m\to \infty} \frac{log(N_d(f, m, \epsilon))}{m} \\
& \leq \lim_{\epsilon \to 0} \limsup_{m\to \infty} \frac{log( \frac{\sum_{i=0}^{m-1} N^i}{C_\epsilon}) }{m} \\
& = \lim_{\epsilon \to 0} \limsup_{m\to \infty} \frac{log(\sum_{i=0}^{m-1} N^i )}{m} \\
& = \lim_{\epsilon \to 0} \limsup_{m\to \infty} \frac{log(\frac{1 – N^m}{1-N})}{m} \qquad \text{Sum formula Geometric series}\\
& = log(N)
\end{align*}

Corollary 1
Let $f: \mathbb{P}^n \to \mathbb{P}^n$ be a holomorphic map, then
$$h_d(f) = log(deg(f)).$$
Proof
immediate from Theorems 3 and 4

–  Examples –

To wrap things up, here are some useful examples to keep in mind. Any isometry or contractive map $f$ of a metric space has zero entropy, both topological, metric and measure theoretic (the measures on topological spaces are always assumed to be regular Borel measures). This is easiest verified for the metric entropy, since the metric $d_f^n = d$, we have that  $N_d(f, n, \epsilon)$ is constant as $n$ increases. The result extends to the measure theoretic entropy and topological entropy by applying   Theorem 1 and Theorem 2 respectively.

Previously it was shown that if $f: X\to X$ is a continuous map on a compact metric space $X$, and the family $\{ f^n \}_{n=0}^\infty$ is (uniformly) equicontinuous, then the $f$ has zero entropy.

In both the above examples the Fatou set the of the family $\{ f^n \}_{n=0}^\infty$ is whole domain. By this observation, at least we can deduce that if the entropy of a holomorphic map on a compact metrizable space has non-zero entropy, then it’s Julia set must be non-empty. This doesn’t really give us that much in the case of $\mathbb{P}^1$ since for any holomorphic map $f$ with $deg(f) \geq 2$ we know that $J(f) \neq \emptyset$, and for $deg(f) = 1$, we have $h(f)= log(1) = 0$.

In general not that much can be said about the entropy given the size and shape of its Julia set. Take for instance the maps $g, f : \mathbb{P}^1 \to \mathbb{P}^1$ defined by $f(z) \mapsto \frac{(z^2 + 1)^2}{4z(z^2 -1 )}$ (see [5] problem 7-g) and $g(z) = z^4$. It is know that the Julia set of $f$ is the entire sphere, and the Julia set of $g$ is the unit circle centered at zero. But the entropy is $h_{top}(f) = h_{top}(g) = log(4)$.

– Sources –

[1] Walters, Peter. An introduction to ergodic theory. Vol. 79. Springer Science & Business Media, 2000.
[2] Adler, Roy L., Alan G. Konheim, and M. Harry McAndrew. “Topological entropy.” Transactions of the American Mathematical Society 114.2 (1965): 309-319.
[3] Bowen, Rufus. “Entropy for group endomorphisms and homogeneous spaces.” Transactions of the American Mathematical Society 153 (1971): 401-414.
[4] Katok, Anatole, and Boris Hasselblatt. Introduction to the modern theory of dynamical systems. Vol. 54. Cambridge university press, 1997.
[5] Milnor, John Willard. Dynamics in one complex variable. Vol. 160. Princeton: Princeton University Press, 2006.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.