Midterm Solutions 2023

Author

Paul Schrimpf

Published

October 25, 2023

Problem 1

Suppose ui,k{0,1} for k=1,...,K, and Yi=k=1Kui,k. Observations for different i are independent and identically distributed. Y1,...,Yn are observed, but ui,k are not. For the same i and jk, uij and uik are independent (but may not be identically distributed).

σ fields

For K=2, what is σ(Y)?

Solution. The range of Y is {0,1,2}. σ(Y) will consist of the preimages of each of these values and all their intersections and unions. That is, σ(Y)={,{(0,0)},{(1,1)},{(0,1),(1,0)},{(0,0),(0,1),(1,0)},{(0,1),(1,0),(1,1)}{(0,0),(1,1)},{(0,0),(0,1),(1,0),(1,1)}}

Identification

  1. Let θk=P(ui,k=1) show that θ=(θ1,...,θK) is not identified by finding an observationally equivalent θ~.

Solution. Since addition is commutative, θ is observationally equivalent to any permutation of its values. For example, with K=2, θ=(θ1,θ2) is observationally equivalent to θ~=(θ2,θ1) because Pθ(Y=0)=(1θ1)(1θ2)=Pθ~(Y=0)Pθ(Y=1)=theta1(1θ2)+θ2(1θ1)=Pθ~(Y=1)Pθ(Y=2)=θ1θ2=Pθ~(Y=2).

  1. Show that if θ1=θ2==θK=ϑ, then ϑ is identified.

Solution. In this case, the expectation of Y is E[Y]=Kϑ, so ϑ is identified by E[Y]/K.

Estimation

  1. Assuming θ1=θ2==θK=ϑ, find the maximum likelihood estimator for ϑ, and show whether it is unbiased.

Solution. The density of Y with respect to a uniform discrete measure on {0,...,K} is f(Yi;θ)=(KYi)θYi(1θ)KYi The loglikelihood is then log(θ;Y)=i=1n(log(KYi)+Yilogθ+(KYi)log(1θ)) Solving the first order condition for θ^ gives: 0=Yiθ^KYi1θ^θ^=1nKi=1nYi

This estimator is unbiased. E[θ^]=1nKi=1nE[Yi]=ϑ.

  1. Show whether or not the maximum likelihood estimator is consistent.

Solution. Note that E[|Yi|]=E[Yi]=Kϑ is finite, and Yi is iid. Therefore, by the law of large numbers, 1ni=1nYipE[Yi]=Kϑ so 1nKi=1nYipϑ.

Testing

For this part, still assume θ1=θ2==θK=ϑ.

  1. For find the most powerful test for testing H0:ϑ=ϑ0 against H1:ϑ=ϑa where ϑa>ϑ0.

Solution. By the Neyman-Pearson Lemma, the likelihood ratio is most powerful. Let’s describe the critical region for this test. The likelihood ratio is lr(ϑa,ϑ0;Y)=iϑaYi(1ϑa)KYiiϑ0Yi(1ϑ0)KYi=ϑaYi(1ϑa)nKYiϑ0Yi(1ϑ0)nKYi=(ϑaϑ0)Yi(1ϑa1ϑ0)nKYi When ϑa>ϑ0, so that ϑaϑ0>0 and 1ϑa1ϑ0<1, the likelihood ratio is an increasing function of Yi and does not depend on the data in any other way. The critical region for a test of size α will be C={Yi:Yi>c(α,θ0)} where Pθ0(Yi>c(θ0))=α.

  1. Is this test also most powerful against the alternative H1:ϑ> vartheta0? (Hint: does the critical region depend on ϑa?

Solution. In the previous part, we saw that the critical region is the same for any ϑa>ϑ0. Hence, the test is uniformly most powerful again H1:ϑ>ϑ0.

Problem 2

Suppose you have a linear model with grouped data yij=xijβ+uij where there are j=1,...,J groups with i=1,...,nj individuals each, and xijRk. For example, j could be firms, and i could index workers in the firms. Throughout, assume that observations are independent across j, E[uij]=0, and E[xijuij]=0. y and x are observed, but u is not. Assume that E[xijxij] has rank K.

Identification

  1. Show that β is identified by explicitly writing β as a function of the distribution of xij and yij.

Solution. We can identify β by regression, β=E[xijxij]1E[xijyij].

  1. Suppose that instead of observing (yij,xij) for each i and j, you only observe group averages, y¯j=1nji=1njyij and x¯j=1nji=1njxij. Can β still be identified?

Solution. Yes, β can still be identified by β=E[x¯jx¯j]1E[x¯jy¯j] as long as E[x¯jx¯j] is nonsingular.

Estimation

Continue to assume that you only observe group averages. Construct a sample analogue estimator for β based on your answer to 2.b.2.

  1. Show whether your estimator is unbiased.

Solution. The estimator is β^=(jx¯jx¯j)1jx¯jy¯j Substituting the model in for y¯j, we have β^=(jx¯jx¯j)1jx¯j(x¯jβ+u¯j) so E[β^]=β+E[(jx¯jx¯j)1jx¯j(x¯ju¯j)]. We are assuming that E[xijuij]=0, so E[x¯juj¯]=0 as well. However, this does not imply that E[(jx¯jx¯j)1jx¯j(x¯ju¯j)]=0, so β^ is biased.

For β^ to be unbiased, we would need the stronger assumption that E[u¯j|x¯j]=0.

  1. Assume that E[x¯jx¯j22]M and E[x¯juj22]M for all j. Show whether or not your estimator is consistent as J.

Solution. In this question and in the distribution question, there is some complication because E[x¯jx¯j] potentially varies with j. A really great answer recognizes this and addresses it in some way. An answer that correctly uses the law of large number and Slutsky’s lemma is also okay.

These assumptions imply that the law of large numbers applies to 1Jx¯jx¯j and 1Jx¯ju¯j. We can show this using Markov’s inequality, but doing so is not necessary for full credit. Using Markov’s inequality we have P(1J(x¯jx¯jE[x¯jx¯j])>ϵ)E[1J(x¯jx¯jE[x¯jx¯j])2]ϵ21JE[x¯jx¯j2]ϵ2MJϵ2 so 1Jx¯jx¯jp1JE[x¯jx¯j]. An identical argument shows that 1Jx¯ju¯jp0.

Therefore, β^=β+(1Jjx¯jx¯j)11Jjx¯ju¯jpβ provided that 1JE[x¯jx¯j] is invertible for all J large enough. It would be sufficient to assume limJ1JE[x¯jx¯j]=C exists and C is invertible, but weaker conditions are possible as well.

Efficiency

Assume that E[uijuk]={σ2 if i= and j=k0 otherwise.

  1. Suppose you observe yij and xij for each i and j. What is the minimal variance unbiased estimator for cβ that is a linear function of y?

Solution. In this case, all the assumptions of the Gauss Markov thereom are met, so ordinary least squares in the best linear unbiased estimator.

  1. Suppose you only observe group averages y¯j and x¯j. What is the minimal variance unbiased estimator for cβ that is a linear function of y¯?

Solution. Now, the Gauss Markov theorem does not directly apply because var(u¯j)=σnnj is not the same for all j. However, we can transform the model to make the variance constant, njy¯j=njx¯jβ+nju¯ju~j Now E[u~u~]=σ2IJ, so the Gauss Markov theorem applies and the best linear unbiased estimator is β^WLS=(jnjx¯jx¯j)1jnjx¯jy¯j

Distribution

Continue to assume that you only observe group averages. Let β^WLS=(j=1Jnjx¯jx¯j)1(j=1Jnjx¯jy¯j). Show that J(β^WLSβ)=J(j=1Jnjx¯jx¯j)1(j=1Jnjx¯ju¯j) converges in distributions and compute the limiting distribution. State any additional assumptions that you need to show convergence.

Solution. As above, assuming E[njx¯jx¯j22] is finite is sufficient for a law of large numbers to apply to 1Jjnjx¯jx¯j.

We already have assumptions that imply E[njx¯ju¯j]=0. If we also assume that this has a variance, then a central limit theorem would apply. It would be an acceptable (and the expected) answer to just assume E[nj2x¯jx¯ju¯j2]=Ω for all j and proceed.

Alternatively, we could assume E[uij2|xij]=σ2 for all i and j and uij are independent across i and j. Then, E[njx¯jx¯ju¯j2]=E[njx¯jx¯jE[nju¯j2|x¯j]]=E[njx¯jx¯j]σ2 If we also assume E[xijxij]=C is the same for all i and j and observations are independent, then E[njx¯jx¯j]=C for all j.

In that case, by the central limit theorem, 1Jj=1Jx¯ju¯jdN(0,E[xijxij]σ2) and therefore, J(β^β)dN(0,E[xijxij]1σ2).

Definitions and Results

  • Measure and Probability:

    • A collection of subsets, F, of Ω is a σ-field , if

      1. ΩF
      2. If AF, then AcF
      3. If A1,A2,...F, then j=1AjF
    • A measure, μ:F[0,] s.t.

      1. μ()=0
      2. If A1,A2,...F are pairwise disjoint, then μ(j=1Aj)=j=1μ(Aj)
    • Lesbegue integral is

      1. Positive: if f0 a.e., then fdμ0
      2. Linear: (af+bg)dμ=afdμ+bgdμ
    • Radon-Nikodym derivative: if νμ, then nonnegative measureable function, dνdμ, s.t. ν(A)=Adνdμdμ

    • Monotone convergence: If fn:ΩR are measurable, fn0, and for each ωΩ, fn(ω)f(ω), then fndμfdμ as n

    • Dominated converegence: If fn:ΩR are measurable, and for each ωΩ, fn(ω)f(ω). Furthermore, for some g0 such that gdμ<, |fn|g for each n1. Then, fndμfdμ

    • Markov’s inequality: P(|X|>ϵ)E[|X|k]ϵk ϵ>0,k>0

    • Jensen’s inequality: if g is convex, then g(E[X])E[g(X)]

    • Cauchy-Schwarz inequality: (E[XY])2E[X2]E[Y2]

    • σ(X) is σ-field generated by X, it is

      • smallest σ-field w.r.t. which X is measurable
      • σ(X)={X1(B):BB(R)}
    • σ(W)σ(X) iff g s.t. W=g(X)

    • Events A1,...,Am are independent if for any sub-collection Ai1,...,Ais P(j=1sAij)=j=1sP(Aij) σ-fields are independent if this is true for any events from them. Random variables are independent if their σ-fields are.

    • Conditional expection of Y given σ-field G satisfies AE[Y|G]dP=AYdP AG

  • Identification X observed, distribution PX, probability model P

    • θ0Rk is identified in P if there exists a known ψ:PRk s.t. θ0=ψ(PX)
    • P={P(;s):sS}, two structures s and s~ in S are observationally equivalent if they imply the same distribution for the observed data, i.e. P(B;s)=P(B;s~) for all Bσ(X).
    • Let λ:SRk, θ is observationally equivalent to θ~ if s,s~S that are observationally equivalent and θ=λ(s) and θ~=λ(s~)
    • s0S is identified if there is no ss0 that is observationally equivalent to s0
    • θ0 is identified (in S) if there is no observationally equivalent θθ0
  • Cramér-Rao Bound: in the parametric model PX{Pθ:θRd} wiht likelihood (θ;x), if appropriate derivatives and integrals can be interchanged, then for any unbiased estimator τ(X), Varθ(τ(X))I(θ)1 where I(θ)=s(x,θ)s(x,θ)dPθ(x)=E[H(x,θ)] and s(x,θ)=log(θ;x)θ

  • Hypothesis testing:

    • P(reject H0|PxP0)=Type I error rate =Px(C)
    • P(fail to reject H0|PxP1)=Type II error rate
    • P(reject H0|PxP1) = power
    • supPxP0Px(C) = size of test
    • Neyman-Pearson Lemma: Let Θ={0,1}, f0 and f1 be densities of P0 and P1, τ(x)=f1(x)/f0(x) and C={xX:τ(x)>c}. Then among all tests C s.t. P0(C)=P0(C), C is most powerful.
  • Projection: PLyL is the projection of y on L if yPLy=infwLyw

    1. PLy exists, is unique, and is a linear function of y
    2. For any y1L, y1=PLy iff yy1L
    3. G=PL iff Gy=yyL and Gy=0yL
    4. Linear G:VV is a projection map onto its range, R(G), iff G is idempotent and symmetric.
  • Gauss-Markov: Y=θ+u with θLRn, a known subspace. If E[u]=0 and E[uu]=σ2In, then the best linear unbiased estimator (BLUE) of aθ=aθ^ where θ^=PLy

  • Convergence in probability:

    • X1,X2,... converge in probability to Y if ϵ>0, limnP(XnY>ϵ)=0
    • If limnE[XnYp]0, then XnpY
    • If XnpX, and f continuous, then f(Xn)pf(X)
    • Weak LLN: if X1,...,Xn are i.i.d. and E[X2] exists, then 1ni=1nXipE[X]
    • Xn=Op(bn) if ϵ>0 Mϵ s.t. limsupP(XnbnMϵ)<ϵ
    • Xn=op(bn) if Xnbnp0
  • Convergence in distribution:

    • X1,X2,... converge in distribution to X if fCb, E[f(Xn)]E[f(X)]
    • If XndX and g is continuous, then g(Xn)dg(X)
    • Slutsky’s lemma: if Ynpc and XndX and g is continuious, then g(Yn,Xn)dg(c,X)
    • Levy’s Continuity Theorem: XndX iff E[eitXn]E[eitX]t
    • CLT: if X1,...,Xn are i.i.d. with E[X1]=μ and Var(X1)=σ2, then 1ni=1nXiμσdN(0,1)
    • Delta Method: suppose n(θ^θ0)dS and h is differentiable, then n(h(θ^)h(θ0))dh(θ0)S

Footnotes

  1. If you cannot find the maximum likelihood estimator, show whether Y¯/K=1nKi=1nYi is unbiased and consistent for partial credit.↩︎

  2. If you could not answer that part, suppose you had shown that β=E[x¯jx¯j]1E[x¯jy¯j].↩︎