Chinese restaurant process

In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a restaurant. Imagine a restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there, or an unoccupied table. At time n, the n customers have been partitioned among m ≤ n tables. The results of this process are exchangeable, meaning the order in which the customers sit does not affect the probability of the final distribution. This property greatly simplifies a number of problems in population genetics, linguistic analysis, and image recognition.

Formal definition

For any positive integer n, let \mathcal{P}_{n} denote the set of all partitions of the set \{ 1, 2, 3,..., n \} \triangleq [n]. The Chinese restaurant process takes values in the infinite Cartesian product \prod_{n \geq 1} \mathcal{P}_{n}. The value of the process at time n is a partition B_n of the set [n], whose probability distribution is determined as follows. At time n=1, the trivial partition B_1 = \{ \{ 1 \} \} is obtained (with probability one). At time n+1 the element "n+1" is either: • added to one of the blocks of the partition B_n, where each block is chosen with probability |b|/(n+1) where |b| is the size of the block (i.e. number of elements), or • added to the partition B_n as a new singleton block, with probability 1/(n+1). The random partition so generated has some special properties. It is exchangeable in the sense that relabeling \{ 1,..., n \} does not change the distribution of the partition, and it is consistent in the sense that the law of the partition of [n-1] obtained by removing the element n from the random partition B_n is the same as the law of the random partition B_{n-1}. The probability assigned to any particular partition (ignoring the order in which customers sit around any particular table) is : \Pr(B_n = B) = \frac{\prod_{b\in B} (|b| -1)!}{n!}, \qquad B \in \mathcal{P}_{n} where b is a block in the partition B and |b| is the size of b. The definition can be generalized by introducing a parameter \theta>0 which modifies the probability of the new customer sitting at a new table to \frac{\theta}{n+\theta} and correspondingly modifies the probability of them sitting at a table of size |b| to \frac{n+\theta}. The vanilla process introduced above can be recovered by setting \theta=1. Intuitively, \theta can be interpreted as the effective number of customers sitting at the first empty table. Alternative definition An equivalent, but subtly different way to define the Chinese restaurant process, is to let new customers choose companions rather than tables. Customer n+1 chooses to sit at the same table as any one of the n seated customers with probability \frac{1}{n+\theta}, or chooses to sit at a new, unoccupied table with probability \frac{\theta}{n+\theta}. Notice that in this formulation, the customer chooses a table without having to count table occupancies---we don't need |b|. ==Distribution of the number of tables==

Distribution of the number of tables

{{Probability distribution| name =Chinese restaurant table| type =mass| parameters =\theta > 0 n \in \{1,2,\ldots \}| support =k \in \{1,2,\ldots,n\}| pdf =\frac{\Gamma(\theta)}{\Gamma(n+\theta)} |s(n,k)| \theta^{k} | mean =\theta (\psi(\theta+n)-\psi(\theta))(see digamma function)| }} The Chinese restaurant table distribution (CRT) is the probability distribution on the number of tables in the Chinese restaurant process. It can be understood as the sum of n independent Bernoulli random variables, each with a different parameter: : \begin{align} K & = \sum_{i=1}^n b_i \\[4pt] b_i & \sim \operatorname{Bernoulli} \left( \frac \theta {i-1+\theta}\right) \end{align} The probability mass function of K is given by : f(k) = \frac{\Gamma(\theta)}{\Gamma(n+\theta)} |s(n,k)| \theta^k, \quad k=1,\dots,n, where s denotes Stirling numbers of the first kind. == Two-parameter generalization ==

Two-parameter generalization

This construction can be generalized to a model with two parameters, \theta & \alpha, commonly called the strength (or concentration) and discount parameters respectively. At time n+1, the next customer to arrive finds |B| occupied tables and decides to sit at an empty table with probability : \frac{\theta + |B| \alpha}{n + \theta}, or at an occupied table b of size |b| with probability : \frac)^{\underline m} & \text{if }k where x^{\overline m}=\prod_{i=0}^{m-1}(x+i) is the rising factorial and x^{\underline m}=\prod_{i=0}^{m-1}(x-i) is the falling factorial. It is worth noting that for the parameter setting where \alpha and \theta = -L\alpha, then (\theta + \alpha)_\,\Gamma(\theta/\alpha + |B|) }{\Gamma(\theta/\alpha)}\prod_{b\in B}\dfrac{\Gamma(|b|-\alpha)}{\Gamma(1-\alpha)}. In the one-parameter case, where \alpha is zero, and \theta>0 this simplifies to : \Pr(B_n = B\mid\theta) = \frac{\Gamma(\theta)\,\theta^}{\Gamma(\theta+n)}\prod_{b\in B} \Gamma(|b|). Or, when \theta is zero, and 0 : \Pr(B_n = B\mid\alpha) =\frac{\alpha^ }}{(L\gamma)^{\overline n}} where P(\ell_1)=\frac1L and x^{\overline m}=\prod_{i=0}^{m-1}(x+i) is the rising factorial. In general, there are however multiple label states that all correspond to the same partition. For a given partition, B, which has \left|B\right|\le L blocks, the number of label states that all correspond to this partition is given by the falling factorial, L^{\underline{\left|B\right|} }=\prod_{i=0}^{\left|B\right|-1}(L-i). Taking this into account, the probability for the partition is : \text{Pr}(B_n=B\mid\gamma,L) = L^{\underline{\left|B\right|}}\,\frac{\prod_{i=1}^L\gamma^{\overline{\left|{b_i}\right|} }}{(L\gamma)^{\overline n}} which can be verified to agree with the general version of the partition probability that is given above in terms of the Pochhammer k-symbol. Notice again, that if B is outside of the support, i.e. |B|>L, the falling factorial, L^{\underline} evaluates to zero as it should. (Practical implementations that evaluate the log probability for partitions via \log L^{\underline}=\log\left|\Gamma(L+1)\right|-\log\left|\Gamma(L+1-|B|)\right| will return -\infty, whenever |B|>L, as required.) Relationship between Dirichlet-categorical and one-parameter CRP Consider on the one hand, the one-parameter Chinese restaurant process, with \alpha=0 and \theta>0, which we denote \text{CRP}(\alpha=0,\theta); and on the other hand the Dirichlet-categorical model with L a positive integer and where we choose \gamma=\frac{\theta}{L}, which as shown above, is equivalent to \text{CRP}(\alpha=-\frac{\theta}{L},\theta). This shows that the Dirichlet-categorical model can be made arbitrarily close to \text{CRP}(0,\theta), by making L large. Stick-breaking process The two-parameter Chinese restaurant process can equivalently be defined in terms of a stick-breaking process. For the case where 0\le\alpha and \theta>-\alpha, the stick breaking process can be described as a hierarchical model, much like the above Dirichlet-categorical model, except that there is an infinite number of label states. The table labels are drawn independently from the infinite categorical distribution \mathbf p=(p_1,p_2,\ldots), the components of which are sampled using stick breaking: start with a stick of length 1 and randomly break it in two, the length of the left half is p_1 and the right half is broken again recursively to give p_2,p_3,\ldots. More precisely, the left fraction, f_k, of the k-th break is sampled from the beta distribution: : f_k\sim B(1-\alpha,\theta+k\alpha),\; \text{for }k\ge1\text{ and }0\le\alpha The categorical probabilities are: : p_k=f_k\prod_{i=1}^{k-1}(1-f_k),\;\text{where the empty product evaluates to one.} For the parameter settings \alpha and \theta=-\alpha L, where L is a positive integer, and where the categorical is finite: \mathbf p=(p_1,\ldots, p_L), we can sample \mathbf p from an ordinary Dirchlet distribution as explained above, but it can also be sampled with a truncated stick-breaking recipe, where the formula for sampling the fractions is modified to: : f_k \sim B(-\alpha, \theta+k\alpha),\;\text{for }1\le k\le L-1\text{ and }\alpha and f_L=1. ==The Indian buffet process==

The Indian buffet process

It is possible to adapt the model such that each data point is no longer uniquely associated with a class (i.e., we are no longer constructing a partition), but may be associated with any combination of the classes. This strains the restaurant-tables analogy and so is instead likened to a process in which a series of diners samples from some subset of an infinite selection of dishes on offer at a buffet. The probability that a particular diner samples a particular dish is proportional to the popularity of the dish among diners so far, and in addition the diner may sample from the untested dishes. This has been named the Indian buffet process and can be used to infer latent features in data. == Applications ==

Applications

The Chinese restaurant process is closely connected to Dirichlet processes and Pólya's urn scheme, and therefore useful in applications of Bayesian statistics including nonparametric Bayesian methods. The Generalized Chinese Restaurant Process is closely related to Pitman–Yor process. These processes have been used in many applications, including modeling text, clustering biological microarray data, biodiversity modelling, and image reconstruction == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com