# KalaiSastry08

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

## Algorithms

Perceptron Algorithm Input: $T \geq 1$, $(x_1,y_1),\ldots,(x_m,y_m) \in \reals^n \times \{0,1\}$. Let $w^1=0$ For $t=1,2,\ldots,T-1$: * Let $w^{t+1}:=w^t+\frac{1}{m} \sum_{i=1}^m \left(y_i-I(w^t \cdot x_i \geq 0)\right)x_i$ Output: $h(x)=I(w^{T+1} \cdot x \geq 0)$.  Input: $T \geq 1$, $(x_1,y_1),\ldots,(x_m,y_m) \in \reals^n \times [0,1]$. Let $w^1=0$ For $t=1,2,\ldots,T-1$: * Let $u^t:=\mathrm{PAV}\left((w^t \cdot x_1,y_1),\ldots,(w^t \cdot x_m,y_m)\right)$ * Let $w^{t+1}:=w^t+\frac{1}{m} \sum_{i=1}^m \left(y_i-u^t(w^t \cdot x_i)\right)x_i$ Output: $h(x)=u^T(w^T \cdot x)$.  Input: $T \geq 1$, $(x^1_1,y^1_1),\ldots,(x^1_m,y^1_m) \in \reals^n \times [0,1],$, $\vdots$ $(x^T_1,y^T_1),\ldots,(x^T_m,y^T_m) \in \reals^n \times [0,1]$, Let $w^1=0$ For $t=1,2,\ldots,T-1$: * Let $u^t:=\mathrm{PAV}\left((w^t \cdot x^t_1,y^t_1),\ldots,(w^t \cdot x^t_m,y^t_m)\right)$ * Let $w^{t+1}:=w^t+\frac{1}{m} \sum_{i=1}^m \left(y_i-u^t(w^t \cdot x^t_i)\right)x^t_i$ Output: $h(x)=u^T(w^T \cdot x)$.  Input: $(x_1,y_1),\ldots,(x_m,y_m) \in \reals \times \reals$. Sort the data so that $x_1 \leq x_2 \leq \ldots \leq x_m$ and, if $x_i=x_{i+1}$, then $y_i \geq y_{i+1}$, for all $i=1,2,\ldots,m-1$. Let $\hat{y}_1=y_1$ * For i=2,\ldots,n: ** If $y_i \geq \hat{y}_{i-1}$ then $\hat{y}_i=y_i$, else: *** Let $j$ be the index of the Let $w^1=0$ For $t=1,2,\ldots,T-1$: * Let $u^t:=\mathrm{PAV}\left((w^t \cdot x_1,y_1),\ldots,(w^t \cdot x_m,y_m)\right)$ * Let $w^{t+1}:=w^t+\frac{1}{m} \sum_{i=1}^m \left(y_i-u^t(w^t \cdot x_i)\right)x_i$ Output: $h(x)=u^T(w^T \cdot x)$. 

## Regression/probabilistic prediction model

### Lemma Cherry

 Lemma Cherry. Let $y_1,y_2,\ldots,y_m \in \reals$ be a sequence and let $z_1\leq z_2\leq \ldots\leq z_m$ be the unique minimizer of $\min_{z_1 \leq z_2 \leq \ldots \leq z_m} \sum_{i=1}^m (z_i - y_i)^2$. Let $b_1 \leq b_2 \leq \ldots \leq b_m$ be another nondecreasing sequence. Then, $\sum_{i=1}^m (z_i-y_i)(b_i-z_i) \geq 0$.

Proof. Let $\lambda \in (0,1]$. Note that the sequence $\lambda b_i + (1-\lambda)z_i$ is also nondecreasing and $z_i$ minimizes the sum of squared differences. Hence,

$\sum_i (\lambda b_i + (1-\lambda) z_i-y_i)^2 - \sum_i (z_i-y_i)^2 \geq 0.$

Collecting sums and using difference of squares gives,

$\sum_i (\lambda b_i-\lambda z_i)(\lambda b_i + (1-\lambda) z_i-y_i + z_i-y_i ) \geq 0$

Dividing through by the positive quantity $2 \lambda$ gives,

$\sum_i (b_i-z_i)\left(\frac{\lambda}{2}(b_i-z_i)+(z_i-y_i)\right)\geq 0$.

Since the above holds for all $\lambda \in (0,1]$, it must hold for $\lambda = 0$ as well, by continuity. Q.E.D.

### Lemma Blueberry

 Lemma Blueberry. ${\mathbb E}\left[(w^{t+1})^2- (w^t)^2 \right] \leq .$

Proof. For notational ease, we write $x_i,y_i,z_i$ instead of $x_i^t,y_i^t,z_i^t$ for this proof.

By expansion,

$(w^{t+1})^2 = (w^t)^2+\frac{2}{m} \sum_{i=1}^m (y^t_i - z^t_i) x^t_i \cdot w^t + \left(\frac{1}{m}\sum_{i=1}^m (y^t_i-z^t_i) x_i\right)^2.$

Now, by Property Pumpernickle of the PAV algorithm, we have that, $\sum (y^t_i-z^t_i) x^t_i \cdot w^t=0$. Hence it remains to show that,

$\left(\frac{1}{m}\sum_{i=1}^m (y^t_i-z^t_i) x_i\right)^2\leq .$

Now,

$\left(\frac{1}{m}\sum_{i=1}^m (y^t_i-z^t_i) x_i\right)^2=(a+b)^2$, for $a=\frac{1}{m}\sum_{i=1}^m (f(x^t_i)-z^t_i) x_i$ and $b=\frac{1}{m}\sum_{i=1}^m(y^t_i-f(x^t_i))x_i$.

Now,

${\mathbb E}[b^2] = \frac{1}{m^2} \sum_{1\leq i,j \leq m} (y^t_i-f(x^t_i))(y^t_j-f(x^t_j))x^t_i \cdot x^t_j = \frac{1}{m^2}\sum_{i=1}^m (y^t_i-f(x^t_i))^2\|x^t_i\|^2.$ The cross terms above are 0 in expectation by independence. Since $(y^t_i-f(x^t_i))^2\leq 1$ and $\|x^t_i\|^2\leq 1$, we have that ${\mathbb E}[b^2] \leq 1/m$ and ${\mathbb E}[|b|] \leq 1/\sqrt{m}$.

Now let $\bar{y}_i$ be the sequence that would have been output had we actually observed $(x_i,f(x_i))$ instead of $(x_i,y_i)$. That is

$\bar{y}_1,\ldots,\bar{y}_m=\mbox{PAV} ((x_1 \cdot w_1,f(x_1)),\ldots,(x_m\cdot w_m,f(x_m)))$. By Lemma

Q.E.D.

### Lemma Cantelope

 Lemma Cantelope. ${\mathbb E}\left[w^{t+1}\cdot w^* - w^t \cdot w^*\right] \geq  \frac{1}{L} \overline{err_t}-\left(1+\frac{1}{L}\right)2\sqrt{\frac{2}{m}}.$ 

Proof.

$\left(w^{t+1}-w^t\right)\cdot w^* = \frac{1}{m}\sum_{i=1}^m\left(y^t_i-z^t_i\right) x^t_i \cdot w^*,$

which can be written as,

$\frac{1}{m}\sum_{i=1}^m\left(f(x^t_i)-z^t_i\right)x^t_i\cdot w^* + \left(y^t_i-f(x^t_i)\right) x^t_i \cdot w^*$ First, ${\mathbb E}\left[\left(y^t_i-f(x^t_i)\right) x^t_i \right]=0$ because.... Next,

$\left(f(x^t_i)-z^t_i\right) x^t_i\cdot w^*=\left(u^*(w^*\cdot x^t_i)-z^t_i\right) x^t_i\cdot w^*.$

Now, as in the proof of Lemma Cucumber, we would like to consider the inverse of $u^*$ on values $z^t_i$. Note that $z^t_i \in [0,1]$. There are two problems with this. First, the range of $u^*$ is an interval $[a,b]\subseteq [0,1]$, but may not include $z^t_i$. To deal with this, we define a new function, $u^{**}:[-1-\frac{1}{L},1+\frac{1}{L}]\rightarrow [0,1]$. The properties of $u^{**}$ that we require are:

• $u^{**}(t) = u^*(t)$ for all $t \in [-1,1]$.
• $u^{**}$ is $L$-Lipschitz and nondecreasing.
• $u^{**}(-1-1/L)=0,u^{**}(1+1/L)=1$, i.e., the range of $u^{**}$ is the entire interval $[0,1]$.

It is not hard to see that, for the domain $[-1-1/L,1+1/L]$, we have chosen, it is always possible to extend $u^*$ (e.g., linearly) to such a function. The second problem is that the inverse of $u^{**}$ (or $u^*$) is not necessarily unique at a point $z \in [0,1]$. We consider the following function, (but any inverse would do equally well)

$g:[0,1]\rightarrow [-1-1/L,1+1/L], g(z) = \inf \{x \in [-1-1/L,1+1/L]~|~u^{**}(x)=z \}.$

Since $u^{**}$ is continuous, we have that $u^{**}(g(z))=z$ for all $z \in [0,1]$.

Now, by Observation Mango,

$\left(u^*(w^*\cdot x^t_i)-z^t_i\right) (x^t_i\cdot w^*-g(z^t_i)) \geq \frac{\left(u^*(w^*\cdot x^t_i)-z^t_i\right)^2}{L}.$

Putting this together with the above, we have,

${\mathbb E}\left[\left(w^{t+1}-w^t\right)\cdot w^*\right] \geq {\mathbb E}\left[\frac{1}{Lm}\sum_{i=1}^m\left(u^*(w^*\cdot x_i^t)-z_i^t\right)^2\right]+ {\mathbb E}\left[\frac{1}{m}\sum_{i=1}^m\left(f(x_i^t)-z_i^t\right)g(z_i^t)\right]$ The first term in the RHS above is $\bar{err}^t/L$. We upper-bound the second term using Lemma Sabra, below. This gives a bound of,

${\mathbb E}\left[\frac{1}{m}\sum_{i=1}^m\left(f(x_i^t)-z_i^t\right)g(z_i^t)\right]= {\mathbb E}\left[\frac{1}{m}\sum_{i=1}^m\left(f(x_i^t)-y_i^t\right)g(z_i^t)\right] \geq -\left(2+\frac{2}{L}\right)\sqrt{\frac{2}{m}}.$ Q.E.D.

### Lemma Tangerine

For this lemma, we show that

 Lemma Tangerine. Let $m\geq 1$ be any integer and $X_1,X_2,\ldots,X_n$ be independent random variables, with ${\mathbb E}[X_i]=0$, each in the bounded range $[-1,1]$. Then, ${\mathbb E}\left[\max_{i \in \{0,1,\ldots,m\}} X_1+X_2+ \ldots+X_i\right] \leq \sqrt{2m}.$

Proof.

Let $Y$ be the random variable $Y = \max \{0,X_1,X_1+X_2,\ldots,X_1+X_2+ \ldots+X_m\}$. Next, we claim that, $\Pr[Y^2\geq t] \leq e^{-t/(2m)}$, for all $t> 0$. Since, $Y\geq 0$, this is equivalent to $\Pr[Y\geq \sqrt{t}] \leq e^{-t/(2m)}$. To see this, fix $t$ and consider the martingale $Z_1,Z_2,\ldots,Z_m$, where $Z_0 = 0$ and,

$Z_{i} = \begin{cases} Z_{i-1} & \mbox{if }Z_{i-1}\geq \sqrt{t}\\ Z_{i-1}+X_{i} & \mbox{otherwise}\end{cases} \mbox{ for }i=1,2,\ldots,{m-1}$. Since ${\mathbb E}[X_i]=0$, the $Z_i$'s form a martingale. with bounded difference 1. Hence, Azuma's inequality implies that $\Pr[Z_m \geq \sqrt{t}] \leq e^{-t/(2n)}$. Moreover, note that if $Y \geq \sqrt{t}$, then $Z_m \geq \sqrt{t}$. Hence, we have shown that $\Pr[Y^2\geq t] \leq e^{-t/(2m)}$, for all $t> 0$. Now, for any nonnegative random variable $R\geq 0$, we have ${\mathbb E}[R]=\int_0^\infty {\mathbb P}[R \geq t]dt$. Hence,

${\mathbb E}[Y^2]=\int_0^\infty \Pr[Y^2\geq t] dt \leq \int_0^\infty e^{-t/(2m)} dt=2m.$

Finally, using the fact that ${\mathbb E}[Y]^2 \leq {\mathbb E}\left[Y^2\right]$ for any random variable (non-negativity of variance), implies the lemma. Q.E.D.

### Lemma Sabra

 Lemma Sabra. For any integer $m \geq 1$ and reals $a Proof. First consider the random variables [itex]A_i'=\frac{A_i-a}{b-a}\in [0,1]$. Then, by linearity of expectation,

${\mathbb E}\left[\frac{A_1X_1+\ldots + A_mX_m}{m}\right]=(b-a){\mathbb E}\left[\frac{A_1'X_1+\ldots + A_m'X_m}{m}\right]+a{\mathbb E}\left[\frac{X_1+\ldots + X_m}{m}\right].$

By the above and the fact that ${\mathbb E}\left[\frac{X_1+\ldots + X_m}{m}\right]=0$, it suffices to prove the lemma for $a=0,b=1$. So that is the case we consider. Moreover, it suffices to provide the upper bound -- the lower bound follows by symmetry, replacing $X_i$ by $-X_i$. We next claim that $\sum_{i=1}^m A_i X_i \leq \max_{0 \leq j \leq m} \sum_{i=1}^j X_i$. To see this, note that:

$A_1 X_1 + \ldots + A_m X_m = \Delta_1 (X_1 +X_2+\ldots X_m) + \Delta_2 (X_2+X_3+\ldots+X_m) + \ldots \Delta_m X_m + \Delta_{m+1} 0,$ where $\Delta_1=A_1,\Delta_2=A_2-A_1, \ldots,\Delta_m=A_m-A_{m-1},\Delta_{m+1}=1-A_m.$ Since $\Delta_i\geq0, \sum_{i=1}^{m+1}\Delta_i=1$, we have that $\sum_1^m A_i X_i$ is a convex combination of $\sum_1^jX_i$, over $j=0,1,\ldots,m$. Hence, it is no larger than the maximum. Since ${\mathbb E}[\max_{0 \leq j \leq m} \sum_{i=1}^j X_i]\leq \sqrt{2n},$ we get the lemma. Q.E.D.