# Your name is Patrick too? What are the odds?

So I’ve been trying to work on my networking skills in preparation for a transition to a postdoc by literally chatting up every single person within a 15 foot radius. This is a surprisingly pleasant exercise. The weird thing is, I met no less than 3 people named Rachel this week. It’s really not that common of a name (in Montreal at least). So I was like, what are the odds?

Turns out I’m a statistics nerd, so I figured I could actually compute this. The US census bureau has a list of the 5000 most common boy and girl first names, with their respective populations. Looking at the (truncated) probability density function, the distribution of names is well approximated by a Zipf-Mandelbrot law with q = 55 and s = 1.6 (this was determined through eyeballing, as seen above). Note that the initial outlier is the name Mary.

The rest, as they say, is cake. We can answer a question like: “what are the odds that if you meet N people of the same sex, n or more of them turn out to have the same, relatively uncommon name”. For instance, 0.5% of girls in the US are called Sarah. This is easily answered by simulating Zipf-Mandelbrot draws (you can do this through multinomial sampling if you’re lazy like I am). So, to answer my question, if you meet 15 girls, the odds that 3 (or more) of them will have the same name, and that name is less common than Sarah is about 0.02%.

So yes, it’s pretty unlikely, and assuming you keep meeting 15 new girls every week for 50 years, there’s a 1 in 2 chance this will ever happen to you. So, yes, I’m feeling pretty special. Ain’t statistics grand?

Here’s the code I used (this is admittedly some of the worst code I’ve ever written):

```function [odds] = namesdistro(N,n,cutoff)
%Computes the approximate odds (as a fraction of 1) that after meeting
%N people of the same sex, n of these people will have the same name, and that name will
%be shared by less than cutoff percentage of the general population

%do this by simulation
batchsize = 1000;
ndraws = 1e6;
nmax = 5e3;
thepdf = 1./( ((1:nmax)'+55).^1.6 );
thepdf = thepdf/sum(thepdf);

odds = 0;
for ii = 1:ndraws/batchsize
ns = mnrnd(N,thepdf',batchsize)';
odds = odds + sum(any(bsxfun(@and,ns>n,thepdf < cutoff)))/ndraws;
end
end
```