Friday, 30 April 2010

rt.representation theory - What is the Zariski closure of the space of semisimple Lie algebras?

A (hopefully helpful) comment. I've thought about the problem of finding the closure of one isomorphism class - I haven't got an answer but I had an idea that I hope is helpful towards a solution.



Consider the closure of the set S of Lie algebras isomorphic to a fixed semisimple lie algebra $L$ of dimension $n$, and fix a basis of $L$ which gives you the structure constants. Then there is a surjection from invertible matrices $GL_n(mathbb{C})$ to $S$ , namely by acting on a fixed standard basis of $V$ with $x in GL_{n}(mathbb{C})$ to give you another basis, and now you force this other basis to have the properties of the basis of $L$ fixed above, i.e. the structure constants - then trace this back to get the values of $Gamma^k_{ij}$ defining this particular Lie algebra.



To be precise with the above, I think it is best described as a transitive action of the algebraic group $GL_{n}(mathbb{C})$ on the variety $S$. I think there are some matrices which act trivially however, and that these matrices correspond to automorphisms of the Lie algebra (which leave the structure constants invariant) – i.e. the point stabilizers correspond to automorphisms of the Lie algebras, so the homogenous space has the structure of the quotient of $GL_{n}(mathbb{C})$ by this point stabilizer, which is the Lie group of automorphisms of $L$.



I think this could help getting the closure of a single isomorphism class of Lie algebras (and since there are only finitely many isomorphism classes of semisimple Lie algebras of fixed dimension, should help with that problem too). But I’m not sure how – I tried naively by saying that perhaps this closure consists of the union of isomorphism classes which you get, in an intuitive sense, by replacing the invertible matrix $x$, by allowing singular matrices as well; but what I get from that seems to be some rubbish so I’m sure that path is mistaken.

Wednesday, 28 April 2010

schubert cells - Detailed proof of cup product equivalent to intersection

Bott and Tu do this completely, in the de Rham theoretic setting of course.



Here's an alternate proof I plan to use in singular theory next time I teach this material, which I find slightly more direct than using Thom classes (which require the tubular neighborhood theorem, etc):



Definition: Given a collection $S = {W_i}$ of submanifolds of a manifold $X$, define the smooth chain complex transverse to $S$, denoted ${C^S}_*(X)$, by using the subgroups of the singular chain groups in which the basis chains $Delta^n to X$ are smooth and transverse to all of the $W_i$.



Lemma: The inclusion ${C^S}_*(X) to C_*(X)$ is a quasi-isomorophism, for any such collection $S$.



Now if $W in S$ then "count of intersection with $W$" gives a perfectly well-defined element $tau_W$ of
${rm Hom}(C^S_*(X), A)$ and thus by this quasi-isomorphism a well-defined cocycle if the $W$ is proper and has no boundary. It is immediate that this cocycle evaluates on cycles which are represented by closed submanifolds through intersection count.
It is also not hard (but takes a bit to work out all the details) to show that the cup product of these cochains (when the submanifolds intersect transversally) is given by the intersection class of their intersection - we compute on the chains which intersect all of $W$, $V$ and $W cap V$ transversally and reduce to linear settings. Consider for example $W$ the $x$-axis in the plane, $V$ with $y$-axis, and then various $2$-simplices can contain the origin (or not) and have various faces which intersect the axes (or not) all consistent with the formula for cup product.

Tuesday, 27 April 2010

at.algebraic topology - homotopy type of complement of subspace arrangement

I am studying the homotopy type of a space,and i hope it would be a $K(pi,1)$ space.
now i have find its covering,once we can say the covering is $K(pi,1)$,so is the space
itself.and the covering is



$mathbb{R}^4-M$ where $M=M_1cup M_2cup M_3cup M_4$,



$M_1={(x,y,z,w)|x,y in mathbb R,z,w inmathbb Z}$



$M_2={(x,y,z,w)|y,z in mathbb R,x,w inmathbb Z}$



$M_3={(x,y,z,w)|x,w in mathbb R,y,z inmathbb Z}$



$M_4={(x,y,z,w)|z,w in mathbb R,x,y inmathbb Z}$



I guess $mathbb{R}^4-M$ is $K(pi,1)$ space,can someone help prove this?

Monday, 26 April 2010

ag.algebraic geometry - Who proved the exactness of Amitsur's complex ?

A foundational result in Grothendieck's descent theory and in his étale cohomology is the exactness of Amitsur's complex. More precisely, suppose we have an $A$-algebra
$Ato B$; then there is a cosimplicial complex associated to it whose $n-$cosimplices are
$B ^ {otimes {(n+1)} }$, and from there a complex obtains
$$ 0to A to B to B otimes B to ldots quad (AMITSUR) $$
For example the map $B to B otimes B$ is $bmapsto 1otimes b -b otimes 1$ and the following maps are obtained similarly by inserting $1$'s in tensor products of copies of $B$ and taking alternating sums. The key result is that this Amitsur complex is exact if the initial algebra $A to B$ is faithfully flat.



The proof is splendid: "one" (ah, that's the point!) remarks that if the structural map has an $A$- linear retraction, then it is easy to conclude by constructing a homotopy. And then one reduces to this case by a bold gambit: since one doesn't know how to prove exactness of $(AMITSUR)$ one tensors with $B$ and gets the even more complicated complex $(AMITSUR)otimes B$ . But now the initial map $Ato B$ has become $B to Botimes B: b mapsto 1otimes b$ , which HAS a retraction: just take the product $Botimes B to B: botimes b' mapsto bb'$ . So the tensored complex is exact and the initial complex was necessarily exact by faithful flatness.



Question: who proved this? I suspect the argument I sketched is due to Grothendieck since I couldn't find a reference to Amitsur in EGA nor in SGA.
So, what exactly did Amitsur prove in this context and how did he do it? I have a vague intuition that he didn't express himself in terms of faithful flatness, but my Internet search failed miserably. So, dear mathoverflow participants, you are my last hope...

ho.history overview - Widely accepted mathematical results that were later shown wrong?

EDIT: The episode I had in mind turns out to be the work of Robert Coleman repairing a gap in a paper by Manin about the Mordell conjecture over function fields. See comments by KConrad below, giving specific references. Note that this is not about a false result, it is about an accepted proof with a gap that was found 20 to 25 years later and repaired.



Original:Requesting assistance with a memory.



This being community wiki, I will give my vague memory. I think someone who was actually there could tell a good story. I have been searching with combinations of words on google with no success.



Anyhoo, when I was in graduate school at Berkeley in the 1980's, a professor, whom I think was likely Robert Coleman, told us a story about a celebrated result on "function fields"
or the function field version of something...
The accepted proof was by someone really big, on google I kept running across the name Manin but I am not at all sure about the name. Prof. Coleman decided to present the proof to a class/seminar. Partway through it became clear that the accepted proof just did not work. I have a sense that the class and professor were able to clean up the proof but I have no idea what publication may have come of this. There is also the chance that the seminar did not occur at Berkeley, rather at an earlier job of the professor concerned. Sigh.



So, there are a few ways this story could be filled in. Many MO people are students or postdocs at Berkeley, somebody could walk down the hall and ask Prof. Coleman if that was really him, and if so what actually happened, or ask Ken Ribet, etc.. Again, someone on MO with encyclopedic knowledge of every possible use of the phrase "function field" might be able to say. Or someone very old, yea, verily stricken in years, like unto me.



Finally, note that the title and text of the OP's question disagree a little, and people have posted both "results" that remained false and correct results with incorrect proofs. Also, my memory is really quite good, but I heard this story once and did my dissertation on differential geometry and minimal submanifolds.

reference request - Computing the index of a Lie algebra: what is known beyond the reductive case?

There is quite a bit of literature by now, in the classical characteristic 0 setting of finite dimensional Lie algebras. Looking up some of the papers listed below on arXiv (usually under math.RT) and others they refer to would be a good way to get into the recent work, including some on nilpotent Lie algebras. Beyond this, I can't answer your specific questions in detail. But as Victor points out, study of the index is only one step. Even in the reductive case, the rank is just one piece of information.



Dmitri I. Panyushev http://front.math.ucdavis.edu/0107.5031



A.N. Panov http://front.math.ucdavis.edu/0801.3025



Jean-Yves Charbonnel and Anne Moreau http://front.math.ucdavis.edu/1005.0831



Celine Righi and Rupert W. T. Yu http://front.math.ucdavis.edu/0908.4201

Sunday, 25 April 2010

gr.group theory - decreasing chain of subgroups in the Heisenberg group

Assuming this is the discrete Heisenberg group $H=H_3({mathbb Z})$, as in my comment above, then here is another way of looking at Mariano's answer (I think). Take any sequence of positive integers
$n_1 < n_2 < dots $ where $n_i vert n_{i+1}$ for all $i$, and put



$$ H_i = H_3(n_i{mathbb Z}) $$



(Mariano's answer corresponds to taking $n_i = 2^i$.)

ct.category theory - If F is left adjoint to G, when does FG preserve limits? When do counits interchange with limits?

Motivation



Suppose that $Fcolon Xto A$ is left adjoint to $Gcolon Ato X$, and let
$varepsiloncolon FGstackrel{.}{to}I_A$ be the counit of the adjunction.
Suppose also that $A$ is $J$-complete (for some category $J$), so that
$operatorname{Lim}$ is a functor $C^Jto C$, where for an arrow
$alphacolon T_1stackrel{.}{to} T_2$ of $C^J$,
$operatorname{Lim}(alpha)$ is the unique arrow of $A$ for which the
following diagram is commutative:



$$
begin{matrix}
operatorname{Lim}(T_1)& stackrel{text{limiting cone}}{longrightarrow} & T_1\
| & & |\
operatorname{Lim}(alpha) & & alpha\
downarrow & & downarrow \
operatorname{Lim}(T_2)& stackrel{text{limiting cone}}{longrightarrow} & T_2
end{matrix}
$$



Let $Tcolon Jto A$ be a functor. We have the natural transformation
$varepsilon Tcolon FGTstackrel{.}{to} T$, and
$operatorname{Lim}(varepsilon T)$ is the dotted line making the
following diagram commutative:



$$
begin{matrix}
operatorname{Lim}(FGT)& stackrel{text{limiting cone}}{longrightarrow} & FGT\
| & & |\
operatorname{Lim}(varepsilon T) & & varepsilon T\
downarrow & & downarrow \
operatorname{Lim}(T)& stackrel{text{limiting cone}}{longrightarrow} & T
end{matrix}
$$



If $FG$ preserves $J$-limits, and
$taucolon operatorname{Lim}(T)stackrel{.}{to}T$ is the lower limiting cone,
then $FGtaucolon FGoperatorname{Lim}(T)stackrel{.}{to}FGT$ is the upper
limiting cone, and the above diagram becomes



$$
begin{matrix}
FGoperatorname{Lim}(T)& stackrel{FGtau}{longrightarrow} & FGT\
| & & |\
operatorname{Lim}(varepsilon T) & & varepsilon T\
downarrow & & downarrow \
operatorname{Lim}(T)& stackrel{tau}{longrightarrow} & T
end{matrix}
$$



Since the naturality of $varepsilon$ implies that for all $jin
operatorname{obj}(J)$ the diagram
$$
begin{matrix}
FGoperatorname{Lim}(T)& stackrel{FGtau_j}{longrightarrow} & FGT(j)\
| & & |\
varepsilon_{mathrm{Lim}T}& & varepsilon_{T(j)}\
downarrow & & downarrow \
operatorname{Lim}(T)& stackrel{tau_j}{longrightarrow} & T(j)
end{matrix}
$$



is commutative, it follows that $varepsilon_{mathrm{Lim}T}$
can replace $operatorname{Lim}(varepsilon T)$ in the last but one
diagram while keeping it commutative. By uniqueness, we get
the nice equation
$$
varepsilon_{mathrm{Lim}T} = operatorname{Lim}(varepsilon T).
$$
Note that it seems that all depends on $FG$ preserving $J$ limits.



Question



If $Fcolon Xto A$ is left adjoint to $Gcolon Ato X$ and $A$ has $J$-limits,
when does $FG$ preserve $J$-limits?
This is obviously true when $F$ preserves limits (for example, when
there is also a left adjoint to $F$), but are there other interesting
situations?



Background



For solving an exercise from Mac Lane, I used some
results from A. Gleason, ''Universally locally connected
refinements,'' Illinois J. Math, vol. 7 (1963), pp. 521--531. In that
paper, Gleason constructs a right adjoint to the inclusion functor
$mathbf{L conn}subset mathbf{Top}$ ($mathbf{L conn}=$ locally
connected spaces with continuous maps), and proves that the counit
of the product of two topological spaces is the product of the
counits (Theorem C). This made me curious when do counits
and limits interchange.

pr.probability - When is a 1-block factor of a non-Markovian process Markov?

Some necessary conditions for $Z$ to be Markov are easy to understand and to write down.



For every $y$, $y'$ and $y''$ in the state space $S$ of $Y$, write $p_3(yy'y'')$ for the probability that $[Y_t=y,Y_{t+1}=y',Y_{t+2}=y'']$, which is independent of time $t$. Assume that $Z=phi(Y)$. For every $z$, $z'$ and $z''$ in $phi(S)$, write $q_3(zz'z'')$ for the sum of $p_3(yy'y'')$ over every $y$, $y'$ and $y''$ such that $z=phi(y)$, $z'=phi(y')$ and $z''=phi(y'')$. Then a necessary condition is that $q_3$ can be factorized, in the sense that there exist functions $r$ and $s$ such that $q_3(zz'z'')=r(zz')s(z'z'')$, for every $z$, $z'$ and $z''$ in $phi(S)$.



Of course, this condition is far from sufficient. In fact, to be able to say anything even moderately interesting about this problem, one should probably specify the kind of processes $Y$ and $Z$ one has in mind.

Saturday, 24 April 2010

Irreducibility of polynomials related to quadratic residues

Regarding the Galois group of the factor $g(x)=g(q,x)$of these Fekete polynomials that is conjectured to be irreducible, here is PARI code that counts, for each prime $q equiv 1 bmod 4$, $17 leq q leq 100$, the number of primes $p$ for which $g(q,x)$ modulo $p$, has $n$ linear factors, for primes $p$ up to half a million, stored in the matrix $c[q,n+1]$, done for $0leq nleq 14$.



f(m,x)=sum(i=1,m-1,kronecker(i,m)*x^i))



g(m,x)=f(m,x)/(x*(x-1)^2*(x+1))



for(n=0,14,forprime(q=17,100,if(q%4==1,forprime(p=2,500000,if(matsize(polrootsmod(g(q,x),p))==[n,1], c[q,n+1]++)))))



If I did not make any mistakes, then, for these nine primes $q=17,29,37,41,53,61,73,89,97$, here is the $9$ by $15$ matrix showing $n=0$ to $14$.



[25204 3 12634 0 3084 0 538 0 72 0 3 0 0 0 0]



[25360 2 12486 0 3101 0 530 0 50 0 9 0 0 0 0]



[25133 1 12637 0 3187 0 519 0 59 0 2 0 0 0 0]



[25320 2 12519 1 3081 0 543 0 65 0 6 0 1 0 0]



[25293 2 12449 0 3206 0 500 0 81 0 6 0 1 0 0]



[25176 2 12603 0 3157 0 527 0 69 0 4 0 0 0 0]



[25110 3 12564 0 3274 0 516 0 65 0 5 0 1 0 0]



[25286 3 12554 0 3116 0 502 0 71 0 6 0 0 0 0]



[25112 3 12711 0 3132 0 526 0 49 0 5 0 0 0 0]

fa.functional analysis - Basis for L_infty(R)

The space $ell^infty_R$ does not have even an M-basis; i.e., a biorthogonal set $(x_t,x_t^*)$ such that the span of the $x_t$ is dense and the $x_t^*$ are total (Lindenstrauss, late 1960s IIRC), so it has nothing like a Schauder basis. Later I proved [PAMS 26. no. 3 467-468 (1970)] that $ell^infty$ also does not have an M-basis. However, each of these spaces does have a biorthogonal set $(x_t,x_t^*)$ such that the span of the $x_t$ is dense. This is in my paper with W.J. Davis [Studia Math. 45 173-179 (1973)].

mg.metric geometry - Upper bound for tetrahedron packing?

There have been several recent advances on packing regular tetrahedra in $mathbb{R}^3$. All the results I've seen have been lower bounds -- first John Conway and Sal Torquato showed that there exists an arrangement of tetraheda filling about 72% of space. This has been improved in a series of papers, and the latest result of which I am aware is Elizabeth Chen's record of 85.63%. (A NYTimes article summarizing the history of the problem can be found here.)



My question is does anyone know of any upper bounds, either published or unpublished? I saw a colloquium by Jeff Lagarias, and he said someone was claiming that they had proved something like $1 - 10^{-26}$, but that it was still unpublished.



(A compactness argument gives that since regular tetrahedra don't tile space the maximum volume is strictly less than one, but this argument does not give a quantitative bound.)

Friday, 23 April 2010

co.combinatorics - 1 rectangle

The upper bound is <3.95.



I hope the code below will show correctly...



It proves that assuming a sum >=3.95 in the central AxB rectangle of the grid
({-B,-B+A,-2A,-A,0,A,2A,B-A,B}+{0,A}) x ({-2B,-B-A,-B,-B+A,-2A,-A,0,A,2A,B-A,B,B+A,2B}+{0,B})
leads to a contradiction in a finite number of steps. 3.95 is NOT best possible for this grid, but 3.94 does not lead to a contradiction. It will be easy to refine the number, but
more worthwhile is probably to search a larger grid (which starts to get slow in awk.)



awk 'BEGIN {

A=1;
# pick B large enough to ensure that there
# are no accidental squares in the grid below
B=1000;

# setting up the grid
x[0]=-B; x[1]=-B+A;
x[1]=-B+A; x[2]=-B+2*A;
x[3]=-2*A; x[4]=-A;
x[4]=-A; x[5]=0;
x[5]=0; x[6]=A;
x[6]=A; x[7]=2*A;
x[7]=2*A; x[8]=3*A;
x[9]=B-A; x[10]=B;
x[10]=B; x[11]=B+A;
M=11;

y[0]=-2*B; y[2]=-B;
y[1]=-B-A; y[5]=-A;
y[2]=-B; y[6]=0;
y[3]=-B+A; y[7]=A;
y[4]=-2*A; y[9]=B-2*A;
y[5]=-A; y[10]=B-A;
y[6]=0; y[11]=B;
y[7]=A; y[12]=B+A;
y[8]=2*A; y[13]=B+2*A;
y[10]=B-A; y[14]=B+B-A;
y[11]=B; y[15]=B+B;
y[12]=B+A; y[16]=B+B+A;
y[15]=2*B; y[17]=3*B;
N=17;

for(i=0; i<=M; i++)
for(j=i; j<=M; j++)
for(k=0; k<=N; k++)
for(l=k; l<=N; l++)
# 0 sum for degenerate rectangles
if(i==j || k==l) {
lo[i,j,k,l]=0;
hi[i,j,k,l]=0;
}
# squares
else if(x[j]-x[i]==y[l]-y[k]) {
lo[i,j,k,l]=-1;
hi[i,j,k,l]=1;
}
# other rectangles
else {
lo[i,j,k,l]=-4;
hi[i,j,k,l]=4;
}

# central rectangle: assume its sum is >=3.95
lo[5,6,6,11]=3.95;

iter=10000;
active=1;
while(iter-- && active) {
active=0;

# traverse all possible combinations of 1 rectangle split into 4
for(i=0; i<M; i++)
for(j=i+1; j<=M; j++)
for(k=0; k<N; k++)
for(l=k+1; l<=N; l++)
for(m=i; m<j; m++)
for(n=k; n<l; n++) {
lo0=lo[i,j,k,l];
lo1=lo[i,m,k,n];
lo2=lo[m,j,k,n];
lo3=lo[i,m,n,l];
lo4=lo[m,j,n,l];
hi0=hi[i,j,k,l];
hi1=hi[i,m,k,n];
hi2=hi[m,j,k,n];
hi3=hi[i,m,n,l];
hi4=hi[m,j,n,l];

# 3rd argument in max() and min() funtions
# is for printing purposes only...
lo0=max(lo0, lo1+lo2+lo3+lo4, 0);
hi0=min(hi0, hi1+hi2+hi3+hi4, 0);
lo1=max(lo1, lo0-hi2-hi3-hi4, 1);
lo2=max(lo2, lo0-hi1-hi3-hi4, 2);
lo3=max(lo3, lo0-hi1-hi2-hi4, 3);
lo4=max(lo4, lo0-hi1-hi2-hi3, 4);
hi1=min(hi1, hi0-lo2-lo3-lo4, 1);
hi2=min(hi2, hi0-lo1-lo3-lo4, 2);
hi3=min(hi3, hi0-lo1-lo2-lo4, 3);
hi4=min(hi4, hi0-lo1-lo2-lo3, 4);

if(lo0>hi0 || lo1>hi1 || lo2>hi2 || lo3>hi3 || lo4>hi4) {
print "CONTRADICTION AT", i,m,j,k,n,l;
exit;
}

lo[i,j,k,l]=lo0;
lo[i,m,k,n]=lo1;
lo[m,j,k,n]=lo2;
lo[i,m,n,l]=lo3;
lo[m,j,n,l]=lo4;
hi[i,j,k,l]=hi0;
hi[i,m,k,n]=hi1;
hi[m,j,k,n]=hi2;
hi[i,m,n,l]=hi3;
hi[m,j,n,l]=hi4;
}
}
print "FINISHED OK";
}

function max(s,t, where) {

if(s<t) {
print "lo=" t, "for", i,m,j,k,n,l, "(" where ")";
active=1;
s=t;
}
return(s);
}

function min(s,t, where) {

if(s>t) {
print "hi=" t, "for", i,m,j,k,n,l, "(" where ")";
active=1;
s=t;
}
return(s);
}
'

Wednesday, 21 April 2010

pr.probability - Result of repeated applications of the binomial distribution?

Here's how I interpret your example: there are a bunch of coins (k initially), each being flipped every round until it comes up tails, at which point the coin is "out," And you want to know, after n rounds, the probability that exactly j coins are still active, for j = 0, ..., k. (The existence of multiple players seems irrelevant.)



In that case, this is pretty elementary: after n rounds, the probability of each individual coin being active is p^n, so you have a binomial distribution with parameter p^n, k trials. Since you want to send p to 1 and n to infinity, note that replacing p by its square root and doubling n gives you the same distribution.



Your problem has a surprisingly fascinating generalization, which I believe is called the Galton-Watson process. Its solution has a very elegant representation in terms of generating functions, but I think there are very few examples in which the probabilities are simple to compute in general. Your instance is one of those. (The generalization: at each round, you have a certain number of individuals, each of which turns (probabilistically, independently) into a finite number of identical individuals. If the individuals are coins and each coin turns into one coin with probability p and zero coins with probability 1-p, and you begin with k coins, then we recover your example.)

computer science - Programming Languages based on Category Theory

CAML by definition is Categorical Abstract Machine Language,



however I am not cetain that you can say that an language explicitly uses category theory. Perhaps you are asking "Are there languages that allow Category Theory Concepts to be easily represented?" or perhaps you are asking if the compilation or interpretation of a particular programming language uses Category Theory in its implementation?



While technically, all Turing-complete capable languages should be equivalently able to express the same set of computations, some languages do so more elegantly than others, allowing the programmer or mathematician to be more eloquent.



I would say LISP and SCHEME, even though based on lambda-calculus, are more connected to the spirit of category theory in concept. While the numbers and integers are conceptually defined as atomic and can be built up from primitives in concept and in theory; in practice, the implementations of SCHEME and LISP and (CLU) tend to take shortcuts to speed up implementation.



The hierarchical ability to pass functions and functions of functions (etc.) as first-class parameters to functions in LISP and SCHEME let you be able to emulate the actions or morphisms of category theory better in that language than others. You just have to start from the ground up, as I have not yet seen a library or package in LISP or SCHEME for category theory.

ag.algebraic geometry - Local fibration vs. stalkwise fibration

These are equivalent.



If $K$ is a simplicial set, and $mathcal{F}$ is a simplicial presheaf, then there's a presheaf of sets $mathcal{F}^K$, defined by $(mathcal{F}^K)(U) = hom(K, mathcal{F}(U))$, where $hom$ is maps of simplicial sets.



The important observation here is that if $K$ is a finite simplicial set, then formation of this gadget commutes with sheafification: $q^*(mathcal{F}^K)approx (q^*mathcal{F})^K$. This is because $mathcal{F} mapsto mathcal{F}^K$ is computed as a finite limit, if $K$ is finite.



Now consider the map of presheaves of sets $f: mathcal{E}^{Delta^n} to mathcal{E}^{Lambda^n_k}times_{mathcal{B}^{Lambda^n_k}} mathcal{B}^{Delta^n}$. Your map $p$ is a local fibration if the sheafification of $f$ is an epimorphism; the map $p$ is a stalkwise fibration if $q^*(f)$ is a surjection for each point $q$. If you have enough points, these mean the same thing.



(This is addressed in the introduction to the paper by Jardine, "Boolean localization in practice" (Documenta Mathematica, v.1), where he tells you what to do even if you don't have enough points!)

Tuesday, 20 April 2010

gt.geometric topology - Poincaré Conjecture and the Shape of the Universe

In Einstein's theory of General Relativity, the universe is a 4-manifold that might well be fibered by 3-dimensional time slices. If a particular spacetime that doesn't have such a fibration, then it is difficult to construct a causal model of the laws of physics within it. (Even if you don't see an a priori argument for causality, without it, it is difficult to construct enough solutions to make meaningful predictions.) There isn't usually a geometrically distinguished fibration, but if you have enough symmetry or even local symmetry, the symmetry can select one. An approximate symmetry can also be enough for an approximately canonical fibration. Once you have all of that, the topology of spacelike slices of the universe is not at all a naive or risible question, at least not until you see more physics that might demote the question. The narrower question of whether the Poincaré Conjecture is relevant is more wishful and you could call it naive, but let's take the question of relating 3-manifold topology in general to cosmology.



The cosmic microwave background, discovered in the 1964 by Penzias and Wilson, shows that the universe is very nearly isotropic at our location. (The deviation is of order $10^{-5}$ and it was only announced in 1992 after 2 years of data from the COBE telescope.) If you accept the Copernican principle that Earth isn't at a special point in space, it means that there is an approximately canonical fibration by time slices, and that the universe, at least approximately and locally, has one of the three isotropic Thurston geometries, $E^3$, $S^3$, or $H^3$. The Penzias-Wilson result makes it a really good question to ask whether the universe is a 3-manifold with some isotropic geometry and some fundamental group. I have heard of the early discussion of this question was so naive that some astronomers only talked about a 3-torus. They figured that if there were other choices from topology, they could think about them later. Notice that already, the Poincaré conjecture would have been more relevant to cosmology if it had been false!



The topologist who has done the most work on the question is Jeff Weeks. He coauthored a respected paper in cosmology and wrote an interesting article in the AMS Notices that promoted the Poincaré dodecahedral space as a possible topology for the universe. But after he wrote that article...



There indeed is other physics that does demote the 3-manifold question, and that is inflationary cosmology. The inflation theory posits that the truthful quantum field theory has a vaguely stable high-energy phase, which has such high energy density that the solution to the GR equations looks completely different. In the inflationary solution, hot regions of the universe expand by a factor of $e$ in something like $10^{-36}$ seconds. The different variations of the model posit anywhere from 60 to thousands of factors of $e$, or "$e$-folds". Patches of the hot universe also cool down, including the one that we live in. In fact every spot is constantly cooling down, but cooling is still overwhelmed by expansion. Instead of tacitly accepting certain observed features of the visible universe, for instance that it is approximately isotropic, inflation explains them. It also predicts that the visible universe is approximately flat and non-repeating, because macroscopic curvature and topology have been stretched into oblivion, and that observable anisotropies are stretch marks from the expansion. The stretch marks would have certain characteristic statistics in order to fit inflation. On the other hand, in the inflationary hot soup that we would never see directly, the rationale for canonical time slices is gone, and the universe would be some 4-manifold or even some fractal or quantum generalization of a 4-manifold.



The number of $e$-folds is not known and even the inflaton field (the sector of quantum field theory that governed inflation) is not known, but most or all models of inflation predict the same basic features. And the news from the successor to COBE, called WMAP, is that the visible universe is flat to 2% or so, and the anistropy statistically matches stretch marks. There is not enough to distinguish most of the models of inflation. There is not enough to establish inflation in the same sense that the germ theory of disease or the the heliocentric theory are established. What is true is that inflation has made experimental predictions that have been confirmed.



After all that news, the old idea that the universe is a visibly periodic 3-manifold is considered a long shot. WMAP didn't see any obvious periodicity, even though Weeks et al were optimistic based on its first year of data. But I was told by a cosmologist that periodicity should still be taken seriously as an alternative cosmological model, if possibly as a devil's advocate. A theory is incomplete science if it is both hard to prove, and if every alternative is laughed out of the room. In arguing for inflation, cosmologists would also like to have something to argue against. In the opinion of the cosmologist that I talked to some years ago, the model of a 3-manifold with a fundamental group, developed by Weeks et al, is as good at that as any proposal.




José makes the important point that, in testing whether the universe has a visible fundamental group, you wouldn't necessarily look for direct periodicity represented by non-contractible geodesics. Instead, you could use harmonic analysis, using a suitable available Laplace operator, and this is what used by Luminet, Weeks, Riazuelo, Lehoucq and Uzan. I also that I have not heard of any direct use of homotopy of paths in astronomy, but actually the direct geometry of geodesics does sometimes play an important role. For instance, look closely at this photograph of galaxy cluster Abell 1689. You can see that there is a strong gravitational lens just left of the center, between the telescope and the dimmer, slivered galaxies. Maye no analysis of the cosmic microwave background would be geometry-only, but geometry would modify the apparent texture of the background, and I think that that is part of the argument from the data that the visible universe is approximately flat. Who is to say whether a hypothetical periodicity would be seen with geodesics, harmonic expansion, or in some other way.



Part of Gromov's point seems fair. I think it is true that you can always expand the scale of proposed periodicity to say that you haven't yet seen it, or that the data only just starts to show it. Before they saw anisotropy with COBE, that kept getting pushed back too. The deeper problem is that the 3-manifold topology of the universe does not address as many issues in cosmology, either theoretical or experimental, as inflation theory does.

Sunday, 18 April 2010

Is there a model theoretic realization of the concept of Arithmetical Hierachy?

There is one more well-known equivalence for $forall exists$ sentences.



Theorem (Chang-Los-Suszko). A theory $T$ is preserved under taking unions of increasing chains of structures if and only if $T$ is equivalent to a set of $forall exists$ sentences.



For a proof, see Keisler, "Fundamentals of model theory", Handbook of Mathematical Logic, p. 63.



I found a related paper, which is older and doesn't quite answer your question but may be of interest. R. C. Lyndon, "Properties preserved under algebraic constructions", Bull. Amer. Math. Soc. 65 n. 5 (1959), 287-299, Project Euclid



According to that paper, and MathSciNet, a general solution to your question should be contained in H. J. Keisler, "Theory of models with generalized atomic formulas", J. Symbolic Logic v. 25 (1960) 1-26,
MathSciNet, JStor

Friday, 16 April 2010

lo.logic - candidate for rigorous _mathematical_ definition of "canonical"?

Although the Bourbaki formulation of set theory is very seldom used in foundations, the existence of a definable Hilbert $varepsilon$ operator has been well studied by set theorists but under a different name. The hypothesis that there is a definable well-ordering of the universe of sets is denoted V = OD (or V = HOD); this hypothesis is equivalent to the existence of a definable Hilbert $varepsilon$ operator.



More precisely, an ordinal definable set is a set $x$ which is the unique solution to a formula $phi(x,alpha)$ where $alpha$ is an ordinal parameter. Using the reflection principle and syntactic tricks, one can show that there is a single formula $theta(x,alpha)$ such that for every ordinal $alpha$ there is a unique $x$ satisfying $theta(x,alpha)$ and every ordinal definable set is the unique solution of $theta(x,alpha)$ for some ordinal $alpha$. Therefore, the (proper class) function $T$ defined by $T(alpha) = x$ iff $theta(x,alpha)$ enumerates all ordinal definable sets.



The axiom V = OD is the sentence $forall x exists alpha theta(x,alpha)$. If this statement is true, then given any formula $phi(x,y,z,ldots)$, one can define a Hilbert $varepsilon$ operator $varepsilon x phi(x,y,z,ldots)$ to be $T(alpha)$ where $alpha$ is the first ordinal $alpha$ such that $phi(T(alpha),y,z,ldots)$ (when there is one).



The statement V = OD is independent of ZFC. It implies the axiom of choice, but the axiom of choice does not imply V = OD; V = OD is implied by the axiom of constructibility V = L.




When I wrote the above (which is actually a reply to Messing) I was expecting that Bourbaki would define canonical in terms of their $tau$ operator (Bourbaki's $varepsilon$ operator). However, I was happily surprised when reading the 'état 9' that Thomas Sauvaget found, they make the correct observation that $varepsilon$ operators do not generally give canonical objects.



A term is said to be 'canonically associated' to structures of a given species if (1) it makes no mention of objects other than 'constants' associated to such structures and (2) it is invariant under transport of structure. Thus, in the species of two element fields the terms 0 and 1 are canonically associated to the field F, but $varepsilon x(x in F)$ is not since there is no reason to believe that it is invariant under transport of structures. They also remark that $varepsilon x(x in F)$ is actually invariant under automorphisms, so the weaker requirement of invariance under automorphisms does not suffice for being canonical.




To translate 'canonically associated' in modern terms:



1) This condition amounts to saying that the 'term' is definable without parameters, without any choices involved. (Note that the language is not necessarily first-order.)



2) This amounts to 'functoriality' (in the loose sense) of the term over the core groupoid of the concrete category associated to the given species of structures.



So this seems to capture most of the points brought up in the answers to the earlier question.

physics - Mathematical definition of running

There are many mathematical perspectives one could take on running, many of them I think are more interesting than the narrow question you posed. (Since it's a graphics question another SX site might have been better.)



(The second one is more about swimming than running.)




You might want to (or not) think about putting a sensor on each knee, each foot, each toe, etc., and consider the paths traced out by each sensor. You could use the language of diffeomorphisms and elastic deformations to talk about "small" (or large) deviations. You could also invoke some functional analysis to be a bit more specific about how the paths can deform.



There are a lot of other perspectives you could take--like what about the forces that come up through the heels/metatarsals/toes and travel through both bone and soft matter? Or, finally getting back to what you brought up: the Lie algebra of parameter space which all the angles of the joints. There you're interested in questions that might be answered—or perhaps they'll lead you toward new questions instead—in an introductory differential-geometry or algebraic-topology text. (Spivak DG v1 or Hatcher AT will do.)



But really what you want, I think, are some practical measurements—science derived from kinesiology—rather than pure-mathematics stuff. Ball-and-socket joints move in like a deformed disk; elbows and knees allow motion in a unit interval; and all of this is tied together with a product that's more complicated than Cartesian (you can't put your hand through your chest, for example). Sort of boring, mathematically; that's the stuff I mentioned above that's covered in an introductory DG or AT text. The more relevant information for you, maybe, will be in empirical/scientific specifics of real bodies.

Thursday, 15 April 2010

conformal geometry - Intuition behind moduli space of curves

(EDIT 1: Replaced hand-waving argument in third paragraph with a hopefully less incorrect version)



(EDIT 2: Added final paragraph about obtaining all conformal deformations for surfaces other than sphere.)



I think it is possible to see the infinitesimal rigidity of the sphere, even if it does involve a PDE as Dmitri says. I think you can also try and see if for other embedded surfaces, all infinitesimal deformations of conformal structure are accounted for by deformations of the embedding in a similar way.



For the case of S2, what you want is to do is take a normal vector field V (i.e. infinitesimal change of embedding) and produce a tangent vector field X such that flowing along X gives the same infinitesimal change in conformal structure as flowing along V. This should amount to solving a linear PDE, so as Dmitri says a PDE is definitely involved, but probably not as hard as proving the existence of isothermal coordinates (which from memory is non-linear). For the standard embedding of S2 there can't be too many choices for this linear differential operator given that it has to respect the SO(3)-symmetry.



I guess we're looking for a first-order equivariant linear operator from normal vector fields to tangent vector fields. If we identify normal fields with functions then two possible candidates are to take X=grad V or X to be the Hamiltonian flow generated by V. I can't think of any others and probably it's possible to prove these are the only such ones. (Assuming it's elliptic, the symbol of the operator must be an SO(3)-equivariant isomorphism from T*S2 to TS2 and there can't be too many choices! Using the metric leads to grad and using the area form leads to the Hamiltonian flow.) Then you just have to decide which one to use.



For the case of a general embedded surface $M$, you can ask "is it possible to obtain all deformations of conformal structure by deforming the embedding into R3?" To answer this we can again think of a normal vector field as a function V on the surface. There is a second-order linear differential operator
$$
Dcolon C^infty(M) to Omega^{0,1}(T)
$$
which sends a normal vector field to the corresponding infinitesimal change of conformal structure. This operator will factor through the hessian with a homomorphism from $T^* otimes T^*$ to $T^{*0,1}otimes T^{1,0}$. The operator $D$ will not be onto, but what we want to know is whether every cohomology class in $H^{0,1}(T)$ has a representative in the image of $D$. At least, this is how I would try and approach the question; I'm sure there are other methods.

Wednesday, 14 April 2010

nt.number theory - How many pairs (M, N) of sets of size n have M + N = {0, 1, ..., n^2-1}?

Let $a$, $b$ be divisors of $n$. Convince yourself that the sets



$M = {0,1,ldots,a-1} + ab cdot {0,1,ldots,n/a-1}$



$N = {0,a,ldots,(b-1)a} + bn cdot {0,1,ldots,n/b-1}$



have the desired property. These pairs $(M,N)$ are all distinct as $a$, $b$ vary over divisors other than $1$, $n$. On the other hand the pairs coming from choosing $a=n$ or $b=1$ or $(a,b)=(1,n)$ all coincide with the obvious ("base $n$") pair, while the pair coming from $(a,n)$ coincides with the pair coming from $(1,a)$. So if $sigma$ is the number of divisors of $n$, this gives you at least $sigma^2 - 3 sigma + 3$ such pairs. For $n=10$ you can check (e.g. using the generating function method that you described) that this gives all the possibilities.



In general there will be more, because you can iterate the construction: you have pairs of the form



$M = {0,1,ldots,d_1-1} + d_1 d_2 {0,1,ldots,d_3-1} + d_1 d_2 d_3 d_4 {0,1,ldots,d_5-1} + cdots $



$N = d_1 { 0,1,ldots,d_2-1} + d_1 d_2 d_3 {0,1,ldots,d_4-1} + cdots $



for divisors $d_1,d_2,ldots$ of $n$ with $d_2 d_4 cdots d_{2k} = n$ and similarly for the product of the odd-indexed $d$'s. Now collisions between the various pairs should happen precisely whenever any of the $d_i$'s happens to be $1$, and so distinct pairs should come from distinct sequences $d_1,ldots,d_{ell}$ of divisors of $n$ other than $1$ such that the odd terms multiply to $n$ and the even terms also multiply to $n$. For instance we find that we've correctly re-counted the $sigma^2-3sigma +3$ possibilities that we found and counted before: the sequences of $d$'s of length at most $4$ are



$n,n$



$a,n,n/a$



$a,b,n/a,n/b$



as $a,b$ range over nontrivial divisors of $n$. But of course if $n$ has more than two prime factors, you get longer sequences as well. I'll leave it to someone else to do the enumeration of the sequences.



I think I've convinced myself that this construction gives you everything, but it might be a pain to write down. (Suppose $M$ is the set containing $1$. Take $d_1$ to be the smallest nonzero integer contained in $N$. Then the smallest integers in $N$ have to consist of multiples of $d_1$ up to some $d_1(d_2-1)$, and the next integer contained in $M$ is $d_1 d_2$. But then $d_1 d_2 + 1,ldots,d_1 d_2 + (d_1-1)$ each have to be in one or the other of the sets, and in fact they have to be in $M$ or else you could form the sum $d_1 (d_2+1)$ in two ways. And so forth.)

Tuesday, 13 April 2010

co.combinatorics - Generating functions for certain statistics on Coxeter groups of type B

Background



In combinatorics one is sometimes interested in various 'statistics'
on a Coxeter group (e.g., functions from the group to the natural
numbers), and to find a 'nice' expression for a corresponding generating
function. For example, the length function $l$ on a Coxeter group
$W$ is an important statistic, and when $W=S_{n}$ is the symmetric
group on $n$ letters, a classical result of this type is the identity
$$sum_{win S_{n}}t{}^{l(w)}=prod_{i=1}^{n}frac{1-t^{i}}{1-t},$$



where $t$ is an indeterminate (cf. Stanley, Enumerative Combinatorics,
vol. 1, Coroll. 1.3.10). There are also variations on the problem,
where one considers sums over elements $w$ whose right descent set



$$D_{R}(w):={xin Wmid l(wx)<l(w)}$$



is contained in a given subset $Isubseteq S$ of the fundamental
reflections $S$ of the group $W$. There are several examples in
the literature of sums of the form
$$
sum_{substack{win W\
D_{R}(w)subseteq I}
}t^{f(w)}quadtext{or}sum_{substack{win W\
D_{R}(w)subseteq I}
}(-1)^{l(w)}t^{f(w)},$$



where $f:Wrightarrowmathbb{N}$ is a given statistic on $W$, and
it is sometimes possible to express these generating functions in
a (non-trivial) simple algebraic way, as in the above example.



Let $[n]$ denote the set ${1,2,dots,n}$, and let $S_{n}^{B}$
be the signed permutation group, that is, the group of all bijections
$w$ of the set $[pm n]={pm1,pm2,dots,pm n}$, such that $w(-a)=-a$,
for all $a$ in the set (cf. Björner & Brenti: Combinatorics of Coxeter
Groups, 8.1). If $win S_{n}^{B}$, we write $w=[a_{1},dots,a_{n}]$
to mean $w(i)=a_{i}$, for $i=1,dots,n$. For $iin[n-1]$, the $i$th
Coxeter generator of $S_{n}^{B}$ is given by
$$
s_{i}:=[1,dots,i-1,i+1,i,i+2,dots,n],$$
and we also put
$$s_{0}:=[-1,2,dots,n].$$



We may therefore identify the set of generators $s_{i}$ with the
set $[n-1]_{0}:=[n-1]cup{0}$. Hence, for any subset $Isubseteq[n-1]_{0}$,
we write $D_{R}(w)subseteq I$ rather than $D_{R}(w)subseteq{s_{i}mid iin I}$.



Questions



In addition to defining a collection of generators, a set $I={i_{1},dots,i_{l}}subseteq[n-1]_{0}$,
with $i_{1}<i_{2}<cdots<i_{l}$ also defines the following polynomial
(related to Gaussian polynomials):



$$alpha_{I,n}(t):=frac{(underline{n})!}{(underline{i_{1}})!prod_{r=1}^{l}prod_{s=1}^{lfloor(i_{r+1}-i_{r})/2rfloor}(underline{2s})}.$$



Here $lfloorcdotrfloor$ denotes the floor function, and we use
the notation $(underline{0}):=1$, $(underline{a}):=1-t^{a}$, for
$ageq1$, and $(underline{a})!:=(underline{1})(underline{2})cdots(underline{a})$.
To get a correct formula, we also put $i_{l+1}:=n$.}



Define the following statistic on $S_{n}^{B}$:
$$tilde{L}(w):=frac{1}{2}|{x,yin[pm n]mid x<y, w(x)>w(y), xnotequiv ypmod{2}}|.$$



The question is now:




Is it true that for any $n$ and $I$ as above, we have



$$alpha_{I,n}(t)=sum_{substack{win S_{n}^{B}\
D_{R}(w)subseteq I}
}(-1)^{l(w)}t^{tilde{L}(w)}?$$




A less precise but more general question is the following: Given a
family of polynomials $p_{I,n}(t)inmathbf{Z}[t]$ depending on $I$
and $n$, is there any general (non-trivial) sufficient criterion
for the existence of functions $f,g:Wrightarrowmathbb{N}$ on a
finite Coxeter group $W$, such that for all $I$ and $n$, we have



$$p_{I,n}(t)=sum_{substack{win W\
D_{R}(w)subseteq I}
}a^{f(w)}t^{g(w)},$$
for some $ainmathbf{Z}$?

Monday, 12 April 2010

Does Milnor K-Theory arise from Waldhausen K-Theory

I don't know if there any evidence for this to be true. Note that Quillen K-groups are defined as homotopy groups of some space (+-construction, Q-construction, Waldhausen construction etc), whereas Milnor K-groups were defined in terms of generators and relations,
which generalize generators and relations for classical K_2.



More invariantly Milnor K-groups can be constructed using homology of GL_n (paper of Suslin and Nesterenko) or as certain motivic cohomology groups of a field (Suslin-Voevodsky).
However, these constructions are unrelated to any homotopy groups.



Also, I'm not sure how you define Milnor K-theory for a general ring R?
(I was interpreting your question with "ring R" replaced by "field F".)

co.combinatorics - Distance measure on weighted directed graphs

Since you're asking for what is "meaningful", I think that one can then argue against some of the question.



The value of metrics on graphs in combinatorial geometry as an area of pure mathematics are more or less the same as their value in applied mathematics, for instance in direction-finding software in Garmin or Google Maps. In any such setting, the triangle inequality is essential for the metric interpretation, but the symmetry condition $d(x,y) = d(y,x)$ is not. An asymmetric metric captures the idea of one-way distance. You can have perfectly interesting geometry of many kinds without the symmetry condition. For instance, you can study asymmetric Banach norms such that $||alpha x|| = alpha ||x||$ for $alpha > 0$, but $||x|| ne ||-x||$, or the corresponding types of Finsler manifolds.



On the other hand, if you want to restore symmetry, you can. For instance,
$$d'(x,y) = d(x,y) + d(y,x)$$
has a natural interpretation as round-trip distance, which again works equally well for pure and applied purposes. You can also use max, directly, or even min with a modified formula. Namely, you can define $d'$ to be the largest-distance metric such that
$$d'(x,y) le min(d(x,y),d(y,x))$$
for all $x$ and $y$. All of these have natural interpretations. For instance, the min formula could make sense for streets that are one-way for cars but two-way for bicycles.

set theory - Questions about ordering of reals and irrationals

(3) Here's how to construct an example. We can assume the segment in question was from 0 to 1 non-inclusively. Also, I will write source numbers in binary and target numbers in base 4.



Consider first the number 1/2, in binary 0.1000... Let's map it to 0.1[0102010...] — it doesn't matter what's inside [...] as long as it's irrational. Now we decide to map all numbers of the form 0.0... to 0.0... and 0.1... to 0.2... . Clearly, so far we didn't break anything.



Now, let's take a different rational number, e.g. 1/4 = 0.010... Similarly, we decide to map it to one of irrationals of the form 0.01[10011010...] and the segment [1/4, 1/2] is ready to go to 0.02...



Select the next rational number, e.g. 1/3 = 0.0101(01). It's breaking in half the segment destined to go to 0.02... No problem, again we select some irrational 0.021[010012...] for 1/3 and move left and right subsegments to 0.020... and 0.022...



Now, so far I was using xxx0, xxx1 and xxx2. But let's sometimes move segments to xxx1, xxx2 and xxx3. Let's do it whenever I'm on a level which is a square of natural number.



We're still increasing, yahoo!



Repeat this process for all rationals ordered by denominator. For any rational we have selected an irrational number by definition. For any irrational, it's the limit of segments broken down into parts. Each breakdown reveals exactly one digit of the result — so we reconstruct it digit-by-digit in ternary. It has infinite number of digits. Moreover, these digits never become periodical thanks to the fact that each $n^2$-digit was shifted by 1. So the result is irrational too.



Since every two irrationals are separated by rational, this function is always increasing. Qued erat construirum.

Sunday, 11 April 2010

st.statistics - MicroArray, tesing if a sample is the same with high variance data.

What you have is a classic case of a high dimensional, low signal to noise ratio signal. There are a lot of ways to proceed, but ultimately you will want to know about three different effects:



  1. Bayesian estimation.

  2. Dimension reduction.

  3. Superefficient estimation.

These three ideas are frequently conflated by people with an informal understanding of them. So let's clear that up right away.



  1. The Bayesian estimation can always be applied. To estimate the probability of success p of independent flips of a possibly unfair coin you can always have a belief that the coin is fair or loaded, and this can lead to a different posterior distribution. You cannot reduce the dimension of the parameter space - unless you decide that you have what passes for 'revealed wisdom' as to that probability. It is also impossible to perform superefficient estimation in this case because the parameter space does not admit it.

Note that in high dimensional cases, the Bayesian prior may reflect beliefs about noise, parameter spaces, and high dimensional estimation in general, as opposed to the traditional understanding that the Bayesian prior reflects belief about the parameter value. The farther you go into the high dimensional / low signal to noise regime, the less you expect to find anything passing for "expert opinion" that relates to the parameter and the more you expect to find all sorts of "expert opinion" about how hard it is to estimate the parameter. This is an important point: You can throw out the bathwater and keep the baby by knowing a lot about babies. But, and this is the one that most statisticians and engineers don't keep in the front window: You can ALSO throw out the bathwater and keep the baby by knowing a lot about bathwater. One of the strategies for exploiting the high availability of noise is that you can study the noise pretty easily. Another important point is that your willingness to believe that you do not have a good idea how to parameterize the probability density means that you will be considering things like Jeffreys' rule for a prior - the unique prior which provides expectations invariant under change of parameterization. Jeffreys' rule is very thin soup as Bayesian priors go - but in the high dimensional low signal to noise ratio it can be significant. It represents another very important principle of this sort of work: "Don't know? Don't Care". There are a lot of things you are not going to know in your situation; you should arrange as much as possible to line up what you are not going to know with what you will not care about. Don't know a good parameterization? Then appeal to a prior (e.g. Jeffreys' rule) which does not depend on the choice of parameterization.



As an example, in the parameterization of poles and zeros, a finite dimensional linear time invariant system has the Jeffreys' prior given by the hyperbolic transfinite diameter of the set of poles and zeros. It turns out that poles and zeros is a provably exponentially bad general parameterization for the finite dimensional linear time invariant system in high dimensions. But you can use the Jeffreys' prior and know that the expectations you would compute with this bad parameterization will be the same (at least on the blackboard) as if you had computed them in some unknown 'good' parameterization.



  1. Dimension reduction. A high dimensional model is by definition capable of dimension reduction. One can, by various means, map the high dimensional parameter space to a lower dimensional parameter space. For example one can locally project onto the eigenspaces of the Fisher information matrix which have large eigenvalues. A lot of naive information theory is along the lines that "fewer parameters is better". It turns out that is false in general, but sometimes true. There are many "Information criteria" which seek to choose the dimension of the parameter space based on how much the likelihood function increases with the parameter. Be skeptical of these. In actual fact, every finite dimensional parameterization has some sort of bias. It is normally difficult to reduce the dimension intelligently unless you have some sort of side information. For example, if you have a translation invariant system, then dimension reduction becomes very feasible, although still very technical. Dimension reduction always interacts with choice of parameterization. Practical model reduction is normally an ad hoc approach. You want to cultivate a deep understanding of Bayesian and superefficient estimation before you choose a form of dimension reduction. However, lots of people skip that step. There is a wide world of ad hoc dimension reduction. Typically, one gets these by adding a small ad hoc penalty to the likelihood function. For example an L1 norm penalty tends to produce parameter estimates with many components equal to zero - because the L1 ball is a cross-polytope (e.g. octahedron) and the vertices of the cross polytope are the standard basis vectors. This is called compressed sensing, and it is a very active area. Needless to say, the estimates you get from this sort of approach depend critically on the coordinate system - it is a good idea to think through the coordinate system BEFORE applying compressed sensing. We will see an echo of this idea a bit later in superefficient estimation. However it is important to avoid confounding dimension reduction with superefficient estimation; so to prove that I give the example of a parameter with two real unconstrained components. You can do compressed sensing in two dimensions. You cannot do superefficient estimation in the plane. Another very important aspect here is "Who's Asking?". If you are estimating a high dimensional parameter, but the only use that will be made of that parameter is to examine the first component? Stop doing that, OK? It is very worthwhile to parameterize the DECISIONS that will be made from the estimation and then look at the preimage of the decisions in the parameter space. Essentially you want to compose the likelihood function (which is usually how you certify your belief about the observations that have been parameterized) with the decision function, and then maximize that (in the presence of your dimension reduction). You can consider the decision function another piece of the baby/bathwater separation, or you can also look at maximal application of don't care to relieve the pressure on what you have to know.



    1. Superefficient estimation. In the 1950s, following the discovery of the information inequality (formerly known as Cramer-Rao bound), statisticians thought that the best estimates would be unbiased estimates. To their surprise, embarassment, and dismay, they were able to show that in three dimensions and higher, this was not true. The basic example is James-Stein estimation. Probably because the word "biased" sounds bad, people adhere much much longer to "unbiased" estimation than they should have. Plus, the other main flavor of biased estimation, Bayesian estimation, was embroiled in an internecine philosophical war among the statisticians. It is appropriate for academic statisticians to attend to the foundations of their subject, and fiercely adhere to there convictions. That's what the 'academy' is for. However, you are faced with the high dimensional low signal to noise ratio case, which dissolves all philosophy. You are in a situation where unbiased estimation will do badly, and an expert with an informative Bayesian prior is not to be found. But you can prove that superefficient estimation will be good in some important senses (and certainly better than unbiased estimation). So you will do it. In the end, the easiest thing to do is to transform your parameter so that the local Fisher information is the identity (we call this the Fisher Unitary parameterization) and then apply a mild modification of James-Stein estimation (which you can find on Wikipedia). Yes, there are other things that you can do, but this one is as good as any if you can do it. There are some more ad hoc methods, mostly called "shrinkage" estimation. There is a large literature, and things like Beran's REACT theory are worth using, as well as the big back yard of wavelet shrinkage. Don't get too excited about the wavelet in wavelet shrinkage - it's just another coordinate transformation in this business (sorry, Harmonic Analysts). None of these methods can beat a Fisher unitary coordinate transformation IF you can find one. Oddly enough, a lot of the work that goes into having a Fisher unitary coordinate transformation is choosing a parameterization which affords you one. The global geometry of the parameter space and superefficient estimation interact very strongly. Go read Komaki's paper:


KOMAKI, F. (2006) Shrinkage priors for Bayesian prediction. to appear in
Annals of Statistics. (http://arxiv.org/pdf/math/0607021)



which makes that clear. It is thought that if the Brownian diffusion (with Fisher information as Riemannian pseudo-metric) on your parameter space is transient, then you can do superefficient estimation, and not if it is recurrent. This corresponds to the heat kernel on the parameter space, etc. This is very well known in differential geometry to be global information about the manifold. Note that the Bayesian prior and most forms of model reduction are entirely local information. This is a huge and purely mathematical distinction. Do not be confused by the fact that you can show that after you do shrinkage, that there EXISTS a Bayesian prior which agrees with the shrinkage estimate; that just means that shrinkage estimates have some properties in common with Bayesian estimates (from the point of view of decisions and loss functions, etc.) but it does not give you that prior until you construct the shrinkage estimate. Superefficient estimation is GLOBAL.



One other excruciatingly scary aspect of superefficient estimation is that it is extremely "non-unique". There is an inexplicable arbitrary choice; typically of a "point of superefficiency", but it is more like a measure of superefficiency that started out as an atomic measure. You have to choose this thing. There is no reason whatever for you to make one or another available choice. You might as well derive your choice from your social security number. And the parameter estimate that you get, as well as the performance of that estimate, depends on that choice. This is a very important reason that statisticians hated this kind of estimation. But they also hate the situation where you have tons of variance and grams of data, and that is where you are. You can prove (e.g. Komaki's paper gives an example of such proof) that your estimation will be better if you make this choice, so you're going to do it. Just don't expect to ever understand much about that choice. Apply the dont' know/don't care postulate - you will not know, so you're better off not caring. The defense of your estimation is the theorem that proves it's better.



It should be very clear now that these three "nonclassical" effects in estimation theory are really distinct. I think most people don't really understand that. And to some extent it's easy to see why.



Suppose you have an overparameterized generalized linear model (GLM), so your Fisher information is singular, but you do something like iteratively reweighted least squares for the Fisher scoring (because that's what you do with a GLM) and it turns out that the software you use solves with say Householder QR with column pivoting. It's down there under the covers enough that many statisticians performing this estimation would not necessarily know that is what was happening. But because the QR with column pivoting regularizes the system, effectively it is estimating the parameter in a reduced dimension system, where the reduction was done in a Fisher unitary coordinate system (because of the R in the QR factorization). It's really hard for people to understand what they are really doing when they are not aware of the effect of each step. We used to call this "idiot regularization" but I think "sleepwalker" is more accurate than "idiot". But what if the software package used modified Cholesky to solve the system? Well that actually amounts to a form of shrinkage (again in a Fisher unitary coordinate system), it can also be considered a form ("maximum a posteriori") Bayesian prior.



So in order to sort out what these effects do, and what that means you should do, you need a reasonably deep understanding of the custody of your digits all the way from the data being observed through the final decision being taken (treatment, grant proposal, etc.).



At this point (if you're still with me) you might want to know why didn't I just write a one line recommendation of some method suitable for beginners.



Well there isn't one. If you really are in the high dimensional, low signal to noise case, then a high quality result is only available to someone who understands a pretty big part of estimation theory. This is because all three of these distinct effects can improve your result; but in each situation it is difficult to predict what combination will be the most successful. In fact the reason you cannot prescribe a method in advance is precisely because you are in the regime you are in. You can bail out and do something out of a canned program, but that has a good risk of leaving a good deal of the value on the table.

ag.algebraic geometry - A useful form of principle of connectedness

As we know there are a lot of principle of connectedness in algebraic geometry. Here is a useful and interesting one:



Suppose T is an integral curve over k. X-->T is a flat family of closed subvarieties in $P_k^n$. If there is a non-empty open subset U in T such that at every closed point t in U, the fiber X_t is connected. Then show every fiber X_t is connected for any t in T.



In consideration of uppercontinuous property, this says that if the parameter space is a curve, then if $h^0(X_t, O_{X_t})$ is locally constant on some open set, then it's locally constant everywhere!(If we further require k is algebraically closed here).



I thing this property is interesting and useful, but I can't prove it, and every reference I can find traces back to Hartshorne's exercise III.11.4. If anyone can give me a proof, I would be very grateful!

? A graph is four colorable if and only if it is planar.

? A graph is four colorable if and only if it is planar.



Is this true, I know that if a graph is planar it is four colorable, but is it true that if a graph is four colorable it must be a planar graph.



(EDIT) The following would have been a better way for me to have ask the question.
What are the requirements for a graph to be planar?
What are the requirements for a graph to be 4 colorable?
Is there a simplification of the intersection of not planar and four colorable?

Friday, 9 April 2010

gr.group theory - Elements living in the conjugacy class and in the centralizer of an m-cycle in Am

Thank you, Douglas.



With the notation giving above and that giving in the paper of Marek Szyjewski (that you refered me), the following statements are equivalent:
1) x^j in C,
2) sgn(lambda_ j )=1,
3) J(j,m)=1, J the Jacobi symbol.



1) <=> 2) is easy (I have chequed).
2) <=> 3) is Theor. 1 of the paper of Marek Szyjewski. This is an unplubished article yet. I had no time to chequed all of it; I have only chequed Case 1, but I guess that Case 2 and 3 are correct.(?)



I am interested in the case m=3 p, with p>3 prime. I need to prove that there exist j, with j mod 3 =2, such that x^j in C.
This amounts to prove that there exists j, 0< j < m, such that:
-) ( j,m)=1,
-) j mod 3 =2 (i.e. J( j,3)= -1),
-) J( j,p)= -1,
because J( j,m)=J( j,3) J( j,p).



Do you have any clue for that?



Thank you in advance.

Thursday, 8 April 2010

geometry - Generic coordinate system representations

Please excuse the verboseness which follows, as the question is rather basic, so I would like to state it carefully so that it will not be accidentally neglected as automatically trivial. If, after my discussion, it still appears trivial, then I am sorry.



First I begin with some motivation. Say we want to define a vector field (in the sense of advanced calculus, rather than say, abstract manifolds) in R^3. Then we need a way to consistently define (a.) points -- that is, the unique positions for each point in the space, and (b.) a directional basis at each point. (When saying "consistently define", I essentially mean that this information can be prescribed with a single set of parametrizations, rather than say, an uncountable set of {point,basis}.)



In the natural Cartesian coordinates, these requirements are satisfied by providing three particular coordinate functions. These coordinate functions are orthonormal, unit speed curves, placed at a designated origin. The tangent space at each point is also typically given in terms of the same basis.



When we start with a Euclidean space, we have no preferred origin and no preferred directions. The example of Cartesian coordinates seems to suggest that we can axiomatize or automate the generation of an arbitrary coordinate system by specifying three non-coplanar, or possibly everywhere non-coplanar curves.



On the other hand, consider another common choice of coordinates like spherical coordinates. In spherical coordinates (and curvilinear coordinates in general) the basis for a tangent space depends on the point. Furthermore, there does not seem to be a simple prescription of three curves which parametrize the space, since the coordinate curves themselves are re-defined for different points in space.



To go even further, one could choose an entirely arbitrary, non-orthogonal coordinate system. For a fully arbitrary system, we do not even care about its parametrization by the Cartesian coordinates. One begins to wonder, what property of this arbitrary system is actually grounding the satisfaction of the intial requirements, (a.) and (b.)? Specifically,



What are the requirements necessary for an arbitrary coordinate system such that it satisfies (a.) and (b.)?



For example, I might conjecture one answer: the coordinate system needs three functions defined in R^3 such that at all points their gradients are non-co-planar. Unfortunately, I am not even sure if this makes sense, since this answer still requires the specification of the Cartesian coordinates (in order to take the gradients, or convert the gradients into the new coordinates), which we were attempting to circumvent in the first place!



I would especially appreciate any attempt at making this discussion more rigorous or precise. Also I would appreciate any references which give a thorough discussion of these ideas.

ag.algebraic geometry - Sections of etale morphisms

We all know that smooth morphisms have sections etale locally. However, the following similar statement is not obvious for me:



If X->Y->Z, X is etale over Y, Y is finite and surjective over Z, then a section of X->Y exists etale locally on Z, i.e. there exists an etale cover U of Z such that X_U->Y_U has a section. Where _U means pullback on U.



I think it is supposed to be easy.



Can anyone explain this to me? Thanks.

Wednesday, 7 April 2010

set theory - Are there as many real-closed fields of a given cardinality as I think there are?

Hi Pete!



There's been a lot of study of this and similar problems. I believe that Shelah's theorem, from his 1971 paper "The number of non-isomorphic models of an unstable first-order theory" (Israel J. of Math) answers your question about real closed fields in the positive.



The best big result on such questions that I know of is in the 2000 Annals paper "The uncountable spectra of countable theories." by Hart, Hrushovski, Laskowski.



To answer the question on real closed fields specifically (and somewhat cautiously since I'm not a model-theorist):



The theory of real closed fields is a complete first order theory, with countable language. It is an unstable (an easy fact, I think, and explained better on wikipedia than I could explain) theory as well. Hence Shelah's result applies, and the bound $2^kappa$ is realized as you surmised.



Bonus points should go to Shelah (and perhaps also to Hart, Hrushovski, Laskowski, whose paper mentions the result of Shelah and proves other things) for proving that this bound is realized (for uncountable cardinals), except for theories $T$ which have all of the following properties:



  1. $T$ has infinite models.

  2. $T$ is superstable.

  3. $T$ has prime models over pairs.

  4. $T$ does not have the dimensional order property.

I have no clue what the fourth property means. But there are plenty of non-superstable theories to which Shelah's theorem applies, and hence which realize your bound (for uncountable cardinals).



For countable cardinality, I think there are still some open problems about how many non-isomorphic models there can be of a given theory, with cardinality $aleph_0$.

reference request - What is a good roadmap for learning Shimura curves?

First of all, Kevin is being quite modest in his comment above: his paper




Buzzard, Kevin. Integral models of certain Shimura curves. Duke Math. J. 87 (1997), no. 3, 591--612.




contains many basic results on integral models of Shimura curves over totally real fields, and is widely cited by workers in the field: 22 citations on MathSciNet. The most recent is a paper of mine:




Clark, Pete L. On the Hasse principle for Shimura curves. Israel J. Math. 171 (2009), 349--365.



http://math.uga.edu/~pete/plclarkarxiv7.pdf




Section 3 of this paper spends 2-3 pages summarizing results on the structure of the canonical integral model of a Shimura curve over $mathbb{Q}$ (with applications to the existence of local points). From the introduction to this paper:



"This result [something about local points] follows readily enough from a description of their [certain Shimura curves over Q] integral canonical models. Unfortunately I know of no unique, complete reference for this material. I have myself written first (my 2003 Harvard thesis) and second (notes from a 2005 ISM course in Montreal) approximations of such a work, and in so doing I have come to respect the difficulty of this expository problem."



I wrote that about three years ago, and I still feel that way today. Here are the documents:



1) http://math.uga.edu/~pete/thesis.pdf



is my thesis. "Chapter 0" is an exposition on Shimura curves: it is about 50 pages long.



2) For my (incomplete) lecture notes from 2005, go to



http://math.uga.edu/~pete/expositions.html



and scroll down to "Shimura Curves". There are 12 files there, totalling 106 pages [perhaps I should also compile them into a single file]. On the other hand, the title of the course was Shimura Varieties, and although I don't so much as attempt to give the definition of a general Shimura variety, some of the discussion includes other PEL-type Shimura varieties like Hilbert and Siegel moduli space. These notes do not entirely supercede my thesis: each contains some material that the other omits.



When I applied for an NSF grant 3 years ago, I mentioned that if I got the grant, as part of my larger impact I would write a book on Shimura curves. Three years later I have written up some new material (as yet unreleased) but am wishing that I had not said that so directly: I would need at least a full semester off to make real progress (partly, of course, to better understand much of the material).



Let me explain the scope of the problem as follows: there does not even exist a single, reasonably comprehensive reference on the arithmetic geometry of the classical modular curves (i.e., $X_0(N)$ and such). This would-be bible of modular curves ought to contain most of the material from Shimura's book (260 pages) and the book of Katz and Mazur Arithmetic Moduli of Elliptic Curves (514 pages). These two books don't mess around and have little overlap, so you get a lower bound of, say, 700 pages that way.



Conversely, I claim that there is some reasonable topology on the arithmetic geometry of modular curves whose compactification is the theory of Shimura curves. The reason is that in many cases there are several ways to establish a result about modular curves, and "the right one" generalizes to Shimura curves with little trouble. (For example, to define the rational canonical model for classical modular curves, one could use the theory of Fourier expansions at the cusps -- which won't generalize -- or the theory of moduli spaces -- which generalizes immediately. Better yet is to use Shimura's theory of special points, which nowadays you need to know anyway to study Heegner point constructions.) Most of the remainder concerns quaternion arithmetic, which, while technical, is nowadays well understood and worked out.

co.combinatorics - Strings and "co-subsequences"

Since you are taking the complement of a substring, and it appears that there may be no firmly established terminology, I propose:



  • a substring complement is what remains after deleting a substring,

  • and more generally, a subsequence complement is what remains after deleting a subsequence.

Thus, one may refer to the substring complement of s in t, and use the notation t - s, or $t setminus s$, with the same notation for the subsequence.



I would prefer this natural language terminology over the alternative co-substring and co-subsequence, which sound unnecessarily technical to my ear, but this difference may be slight.



It does seem worthwhile, however, to distinguish between the two cases, and so I would argue against using the term co-subsequence, as you suggested in your question, to refer to the substring complement.

cv.complex variables - Branched coverings of Riemann surfaces with specified branch points.

As mentioned by David, the answer is "no": If $Sigma$ is the Riemann sphere and $mathfrak{p}$ consists of a single point, then there is no such covering. (Indeed, in this case $Sigmasetminusmathfrak{p}$ is simply-connected, so there is only one ramified cover, and this cover has degree one ...



If you have more than one point, or $Sigma$ is not the sphere, then the answer is "yes". In the case of the sphere with two points, you can use $zmapsto z^d$ (up to conformal change of coordinate), and this is the only choice.



Otherwise, your cover $S$ can be chosen to be the unit disc. Indeed, consider the surface as an "orbifold" with given some ramification indices at your given points. It is known that this orbifold has a universal covering, which means that there is a holomorphic function, on the sphere, plane or disc, which is ramified in each preimage of the ramification points, with multiplicity a multiple of the given ramification index. In most cases, in particular if the ramification indices are taken sufficiently large, the universal cover will be the unit disc, as claimed.



This argument also works for infinite but discrete subsets of $Sigma$.



In fact, it is not hard to see that the result will be true for any countable collection of branched values. (Here of course the map will not be an orbifold covering. However, using the orbifold covering argument we first see that we prove the result for the disc and a discrete set of points, and then apply the orbifold argument again ...)



(Of course, even if the surface $Sigma$ is compact, this argument gives a non-compact cover - which may or may not be what you had in mind.)

Tuesday, 6 April 2010

co.combinatorics - Sperner's theorem and "pushing shadows around"

Let $mathcal{B}$ be a collection of $k$-sets , subsets of an $n$-set $S$. For $k < n$ define the shade of $mathcal{B}$ to be $$nabla mathcal{B}={ Dsubset S : |D|=k+1,exists Bin mathcal{B},Bsubset D}$$ and the shadow of $mathcal{B}$ to be
$$Delta mathcal{B}={ Dsubset S : |D|=k-1,exists Bin mathcal{B},Dsubset B}$$ Sperner proved the lemma: $|Delta mathcal{B}|geq frac{k}{n-k+1}|mathcal{B}|$ for $k > 0$, and $|nabla mathcal{B}|geq frac{n-k}{k+1}|mathcal{B}|$ for $k < n$.



This implies that if $kle frac{1}{2}(n-1)$, then $|nabla mathcal{B}|geq |mathcal{B}|$ and if $kgeqfrac{1}{2}(n+1)$ then $|Delta mathcal{B}|geq |mathcal{B}|$. Now the proof proceeds in the obvious steps: Suppose we have an antichain $mathcal{A}$, and it has $p_i$ elements of size $i$. If $i < frac{1}{2}(n-1)$ then from the above we can substitute them with $p_i$ elements of size $i+1$. Similarly for the elements of size $i > frac{1}{2}(n+1)$. You end up with an antichain of elements of size $lfloor frac{n}{2}rfloor$ giving you the proof of Sperner's theorem.



I really enjoyed Combinatorics of finite sets by I. Anderson which treats Sperner's theorem, LYM, Erdos-Ko-Rado, Kruskal-Katona etc. The above is just a sketch but there you can find all the details.

Sunday, 4 April 2010

The definition of homotopy in algebraic topology

In this post, let $I=[0,1]$.



Something about the definition of homotopy in algebraic topology (and in particular in the study of the fundamental group) always puzzled me. Most books on the fundamental group often begin with the basic notion of a homotopy of curves (or more generally, continuous functions between topological spaces) and describe it intuitively as "a continuous deformation of one curve into another". They often supplement this statement with some nice picture, like this one in Wikipedia. When I was taught algebraic topology, I too had heard a motivating explanation as above and was shown a picture of this sort. From this I could already guess what a (supposedly) natural formal definition would be. I expected it to look something like this:




Let $X$ be a topological space and let
$f,g : I to X$ be two curves in $X$.
Then a homotopy between $f$ and $g$
is a family of curves $h_t: I to X$
indexed by $t in I$ (the "time"
parameter) such that $h_0 = f$, $h_1 =
> g$ and the function $t mapsto f_t$ is
continuous from $I$ to $C(I,X)$ (the
space of curves in $X$ with domain
$I$, equipped with some suitable
topology).




However, the definition given (which is used in every book on algebraic topology which I sampled) is similar, but not quite what I thought. It is defined as a continuous function $H: I times I to X$ such that $H(s,0)=f(s)$ and $H(s,1)=g(s)$ for all $s in I$.



This actually quite surprised me, for several reasons. First, the intuitive definition of a homotopy as a "continuous deformation" contains no mention of points in the space $X$ - it gives the feeling that it is the paths that matter, not the points of the underlying space (though obviously one needs the space in question to define the space of paths $C(I,X)$). However, the above definition, while formally almost equivalent to the definition I thought of (up to a definition of a "good" topology on $C(I,X)$), makes the underlying space $X$ quite explicit, it appearing explicitly in the range of the homotopy.



Moreover, many of the properties related to homotopies, the fundamental group and covering spaces can be expressed using the vocabulary of category theory, using universal properties. Now, from a categorical-theoretic point of view, wouldn't one want to suppress the role of the underlying space as much as one can (in favor of its maps and morphisms)?



Additionally, the definition of homotopy (as used) seems notationally inconvenient to me, in that it is less clear which of the two variables is the time parameter (each mathematician has his own preference, it seems). Also, the definition of many specific homotopies looks needlessly complicated in this notation, IMO. For instance, if $f,g$ are two curves in $mathbb{R}^n$ then they are homotopic, and one can write the obvious homotopy either as $H(s,t)=tf(s)+(1-t)g(s)$ or as $h_t = tf+(1-t)g$. Maybe that's just me, but the second notation seems much more natural and easier to understand than the first one. Formulae of this sort appear frequently in the study of the fundamental group of various spaces (and in the verification that the fundamental group is indeed a group), and using the $H(s,t)$ notation makes these formulae much more cumbersome, in my opinion.



So, to sum up, I have two questions:




1) For a topological space $X$, can
$C(I,X)$ be (naturally) topologized so
that "my" definition of homotopy (see above) and
the usual definition coincide (by
setting $h_t (x) = H(x,t)$)?



2) If so, why isn't such a definition
preferred? See my arguments above.


Saturday, 3 April 2010

ag.algebraic geometry - Is the inertia stack of a Deligne-Mumford stack always finite?

I'm not sure about the suggested equivalence in the last two sentences of your question, but at least the statement about etale group schemes has a negative answer.



That is, it is possible to have an etale group scheme $G rightarrow S$,
with $G$ and $S$ both finite type over a field $k$, but $G$ not finite over $S$.



For example, let $H$ be the constant group scheme ${mathbb Z}/2{mathbb Z}$ over $S$,
let $s$ be some fixed closed point of $S$, and let $G := H setminus 1_s,$
where $1_s$ is the non-zero element of the fibre $({mathbb Z}/2{mathbb Z})_s$.



Then $G$ is open in $H$, hence etale over $S$. Assuming that $S$ is positive dimensional,
it is certainly not finite (we deleted one
point of one fibre), and it is a subgroup scheme of $H$. (If $T$ is an $S$-scheme,
then $G(T)$ is the subgroup of $H(T)$ consisting of points whose values at points of $T$ lying
over $s$ are trivial.)

Is the convex combination of two potential games a potential game?

My question: is the set of potential games closed under convex combinations?



An n player game with action set $A = A_1 times ldots times A_n$ and payoff functions $u_i$ is called an exact potential game if there exists a potential function $Phi$ such that:
$$forall_{ain A} forall_{a_{i},b_{i}in A_{i}} Phi(b_{i},a_{-i})-Phi(a_{i},a_{-i}) = u_{i}(b_{i},a_{-i})-u_{i}(a_{i},a_{-i})$$



A game is a general (ordinal) potential game if there exists a potential function $Phi$ such that:
$$forall_{ain A} forall_{a_{i},b_{i}in A_{i}} sgn(Phi(b_{i},a_{-i})-Phi(a_{i},a_{-i})) = sgn(u_{i}(b_{i},a_{-i})-u_{i}(a_{i},a_{-i}))$$



Potential games are interesting because they always have pure strategy Nash equilibria: in particular, a sequence of best-responses must eventually converge to one.



Say that we have two games on the same action set, with utility functions $u_i$ and $u'_i$ respectively, for each player $i$. For any $0 leq p leq 1$, there is a convex combination of these two games, again on the same action set, where the utility function for each player $i$ is now $u^p_i(cdot) = (1-p)u_i(cdot) + pu'_i(cdot)$.



Clearly, the convex combination of two exact potential games is also an exact potential game: just take the same convex combination of the two potential functions.



But is it possible to have two (general) potential games such that their convex combination is not a potential game?

Friday, 2 April 2010

ho.history overview - What is the origin of the term "spectrum" in mathematics?

I don't know the full story, but I found the following interesting tidbits in History of Functional Analysis by Dieudonné.



On page 171 he writes the following about physicists in the 1920s:
"It finally dawned upon them that their "observables" had properties which made them look
like hermitian operators in Hilbert space, and that, by an extraordinary coincidence, the "spectrum" of Hilbert (a name which he had apparently chosen from a superficial analogy) was
to be the central conception in the explanation of the "spectra" of atoms."



Dieudonné earlier writes (page 150):
"Although Hilbert does not mention Wirtinger's paper, it is
probable that he had read it (it is quoted by several of his
pupils), and it may be that the name "Spectrum" which he used
came from it; but it is a far cry from the vague ideas of
Wirtinger to the extremely general and precise results of
Hilbert."



He's referring to the same paper by Wirtinger referred to in Gjergji Zaimi's answer.

Thursday, 1 April 2010

geometric langlands - Double affine Hecke algebras and mainstream mathematics

Well the first thing to say is to look at the very enthusiastic and world-encompassing papers of Cherednik himself on DAHA as the center of the mathematical world (say his 1998 ICM).
I'll mention a couple of more geometric aspects, but this is a huuuge area..



There are at least three distinct geometric appearances of DAHA, which you could classify by the number of loops (as in loop groups) that appear - two, one or zero.
(BTW for those in the know I will mostly intentionally ignore the difference between DAHA and its spherical subalgebra.)



Double loop picture: See e.g. Kapranov's paper arXiv:math/9812021 (notes for lectures of his on it available on my webpage) and the related arXiv:math/0012155. The intuitive idea, very hard to make precise, is that DAHA is the double loop (or 2d local field, such as F_q((s,t)) ) analog of the finite (F_q) and affine (F_q((s)) ) Hecke algebras. In other words it appears as functions on double cosets for the double loop group and its "Borel" subalgebra. (Of course you need to decide what "functions" or rather "measures" means and what "Borel" means..) This means in particular it controls principal series type reps of double loop groups, or the geometry of moduli of G-bundles on a surface, looked at near a "flag" (meaning a point inside a curve inside the surface). The rep theory over 2d local fields that you would need to have for this to make sense is studied in a series of papers of Kazhdan with Gaitsgory (arXiv:math/0302174, 0406282, 0409543), with Braverman (0510538) and most recently with Hrushovski (0510133 and 0609115). The latter is totally awesome IMHO, using ideas from logic to define definitively what measure theory on such local fields means.



Single loop picture: Affine Hecke algebras have two presentations, the "standard" one (having to do with abstract Kac-Moody groups) and the Bernstein one (having to do specifically with loop groups). These two appear on the two sides of Langlands duality (cf eg the intro to the book of Chriss and Ginzburg). Likewise there's a picture of DAHA that's dual to the above "standard" one. This is developed first in Garland-Grojnowski (arXiv:q-alg/9508019) and more thoroughly by Vasserot arXiv:math/0207127 and several papers of Varagnolo-Vasserot. The idea here is that DAHA appears as the K-group of coherent sheaves on G(O)G(K)/G(O) - the loop group version of the Bruhat cells in the finite flag manifold (again ignoring Borels vs parabolics). Again this is hard to make very precise. This gives in particular a geometric picture for the reps of DAHA, analogous to that for AHA due to Kazhdan-Lusztig (see again Chriss-Ginzburg).



[EDIT: A
new survey on this topic by Varagnolo-Vasserot has just appeared.]



Here is where geometric Langlands comes in: the above interp means that DAHA is the Hecke algebra that acts on (K-groups of) coherent sheaves on T^* Bun_G X for any Riemann surface X -- it's the coherent analog of the usual Hecke operators in geometric Langlands.
Thus if you categorify DAHA (look at CATEGORIES of coherent sheaves) you get the Hecke functors for the so-called "classical limit of Langlands" (cotangent to Bun_G is the classical limit of diffops on Bun_G).



The Cherednik Fourier transform gives an identification between DAHA for G and the dual group G'. In this picture it is an isom between K-groups of coherent sheaves on Grassmannians for Langlands dual groups (the categorified version of this is conjectured in Bezrukavnikov-Finkelberg-Mirkovic arXiv:math/0306413). This is a natural part of the classical limit of Langlands: you're supposed to have an equivalence between coherent sheaves on cotangents of Langlands dual Bun_G's, and this is its local form, identifying the Hecke operators on the two sides!



In this picture DAHA appears recently in physics (since geometric Langlands in all its variants does), in the work of Kapustin (arXiv:hep-th/0612119 and with Saulina 0710.2097) as "Wilson-'t Hooft operators" --- the idea is that in SUSY gauge theory there's a full DAHA of operators (with the above names). Passing to the TFT which gives Langlands kills half of them - a different half on the two sides of Langlands duality, hence the asymmetry.. but in the classical version all the operators survive, and the SL2Z of electric-magnetic/Montonen-Olive S-duality is exactly the Cherednik SL2Z you mention..



Finally (since this is getting awfully long), the no-loop picture: this is the one you referred to in 2. via Dunkl type operators. Namely DAHA appears as difference operators on H/W (and its various degenerations, the Cherednik algebras, appear by replacing H by h and difference by differential). In this guise (and I'm not giving a million refs to papers of Etingof and many others since you know them better) DAHA is the symmetries of quantum many-body systems (Calogero-Moser and Ruijsenaars-Schneiders systems to be exact), and this is where Macdonald polynomials naturally appear as the quantum integrals of motion.
The only thing I'll say here is point to some awesome recent work of Schiffmann and Vasserot arXiv:0905.2555, where this picture too is tied to geometric Langlands..
very very roughly the idea is that H/W is itself (a degenerate version of an open piece of) a moduli of G-bundles, in the case of an elliptic curve. Thus studying DAHA is essentially studying D-modules or difference modules on Bun_G in genus one (see Nevins' paper arXiv:0804.4170 where such ideas are developed further). Schiffman-Vasserot show how to interpret Macdonald polynomials in terms of geometric Eisenstein series in genus one..
enough for now.

fa.functional analysis - Barrelled, bornological, ultrabornological, semi-reflexive, ... how are these used?

First, there is a great survey of locally convex topological spaces in section 424 of the Encyclopedic Dictionary of Mathematics. (The EDM, if you have not seen it, is a fabulous reference for all kinds of information. Even though we all have Wikipedia, the EDM is still great.) At the end it has a chart of many of these properties of topological vector spaces, indicating dependencies, although not all of them. This chart helped me a lot with your question. On that note, I am not a functional analyst, but I can play one on MO. Maybe a serious functional analyst can give you a better answer. (Or a worse answer? I wonder if to some analysts, everything is the machine.)



Every important topological vector space that I have seen in mathematics (of those that are over $mathbb{R}$ or $mathbb{C}$) is at least a Banach space, or an important generalization known as a Frechet space, or is derived from one of the two. We all learn what a Banach space is; a Frechet space is the same thing except with a countable family of seminorms instead of one norm. Many of the properties that you list, for instance metrizable and bornological, hold for all Frechet spaces. In emphasizing Banach and Frechet spaces, the completeness property is implicitly important. Since a normed linear space is a metric space, you might as well take its completion, which makes it Banach. A Frechet space is a generalization of a metric space known as a uniform space and you might as well do the same thing. Also the discussion is not complete without mentioning Hilbert spaces. You can think of a Hilbert space either as a construction of a type of Banach space, or a Banach space that satisfies the parallelogram law; of course it's important.



Of the properties that do not hold for all Frechet spaces, I can think of four that actually matter: reflexive, nuclear, separable, and unconditional. In addition, a Schwartz space isn't really a space with a property but a specific (and useful) construction of a Frechet space. (Edit: It seems that "Schwartz" means two things, the Schwartz space of rapidly decreasing smooth functions, and the Schwartz property of a locally convex space.)



A discussion of the properties that I think are worth knowing, and why:



  • reflexive. This means a space whose dual is also its pre-dual. If a Banach space has a pre-dual, then its unit ball is compact in the weak-* topology by the Banach-Alaoglu theorem. In particular, the set of Borel probability measures on a compact space is compact. This is important in geometry, for sure. Famously, Hilbert spaces are reflexive. Note also that there is a second important topology, the weak-* topology when a pre-dual exists, which you'd also call the weak topology in the reflexive case. (I am not sure what good the weak topology is when they are different.)


  • separable. As in topology, has a countable dense subset. How much do you use manifolds that do not have a countable dense subset? Inseparable topological vectors are generally not that useful either, with the major exception of the dual of a non-reflexive, separable Banach space. For instance $B(H)$, the bounded operators on a Hilbert space, is inseparable, but it is the dual of the Banach space of trace-class operators $B_1(H)$, which is separable.


  • unconditional. It is nice for a Banach space to have a basis, and the reasonable kind is a topological basis, a.k.a. a Schauder basis. The structure does not resemble familiar linear algebra nearly as much if linear combinations are only conditionally convergent. An unconditional basis is an unordered topological basis, and an unconditional space is a Banach space that has one. There is a wonderful structure theorem that says that, up to a constant factor that can be sent to 1, the norm in an unconditional space is a convex function of the norms of the basis coordinates. All unconditional Banach spaces resemble $ell^p$ in this sense. Note also that there is a non-commutative moral generalization for operators, namely that the norm be spectral, or invariant under the available unitary group.


  • nuclear. Many of the favorable properties of the smooth functions $C^infty(M)$ on a compact manifold come from or are related to the fact that it is a nuclear Frechet space. For instance, in defining it as a Frechet space (using norms on the derivatives), you notice that the precise norms don't matter much. This doesn't necessarily mean that you should learn the theory of nuclear spaces (since even most analysts don't). But in drawing a line between constructions and properties, Opinion seems to be split as to whether the theory of nuclear spaces is tangential or fundamental. Either way, my impression is that the main favorable properties of $C^infty(M)$ are that it is Frechet, reflexive, and nuclear. On the other hand, infinite-dimensional Banach spaces are never nuclear.



Prompted by Andrew's question, I compiled some data on relationships between types of locally convex topological spaces. In particular, I made a Hasse diagram (drawn with Graphviz) of properties included below. It began as a simplification of the one in EDM, but then I added more conditions. The rule for the Hasse diagram is that I only allow "single-name" properties, not things like Frechet and nuclear, although adverbs are allowed. The graph makes the topic of locally convex spaces look laughably complicated, but that's a little unfair. You could argue that there are more than enough defined properties in the field, but of course mathematicians are always entitled to make new questions. Moreover, if you look carefully, relatively few properties toward the top of the diagram imply most of the others, which is part of the point of my answer. If a topological vector space is Banach or even Frechet, then it automatically has a dozen other listed properties.



I have the feeling that the Hasse diagram has missing edges even for the nodes listed. If someone wants to add comments about that, that would be great. (Or it could make a future MO question.) The harder question is combinations of properties. I envision a computer-assisted survey to compare all possible combinations of important properties, together with citations to counterexamples and open status as appropriate. I started a similar computer-assisted survey of complexity classes a few years ago.



Diagram of some LCTVS properties