1. INTRODUCTION

The concept of assortative mating was first formalized by ^{Becker (1973}, ^{1974}), and since then it has been widely studied by economists and sociologists alike. At the heart, the idea behind assortative mating is simple: if the marginal “surplus” generated from mating is increasing along an attribute (for us education) of a partner, then in a stable equilibrium an educated male matches with an educated female, and vice versa^{1}. Most of the earlier works in this area have focused on testing for such assortative mating or sorting in the marriage market and labor market. Even though the idea is simple, finding support for sorting in the data is known to be difficult (^{Choo and Siow (2006}); ^{Eeckhout and Kircher (2011}); ^{Siow (2015)})^{2}.

There is a substantial amount of work on this subject for the U.S, both theoretical and applied. Evidence for the U.S suggests that mating is assortative, ^{Choo and Siow (2006}); ^{Siow (2015)}, and that it has increased through time, ^{Greenwood, Guner, Kocharkov, and Santos (2014}). On the other hand, as far as we are aware, there is no evidence for Argentina. This paper thus, seeks to contribute to this literature by investigating if mating is assortative and if so, if the degree of positive matching has changed through time.

We are interested in the marriage pattern in Argentina and its effect (if any) on couple formation. Using Argentinian household survey data from 1980-2014 for Greater Buenos Aires, we conduct several tests to check for positive assortative mating. We focus exclusively on Greater Buenos Aires in order to minimize any errors in defining a “marriage market”^{3}. Also, we concentrate our study selecting some years that are not associated with some major macroeconomic crises in Argentina. Our results indicate that: a) there is a positive effect of husbands education on the wife’s education in the years considered; b) there is evidence that more educated people marry more educated people, and vice versa; and c) there is a “weak” pattern of totally positive of order 2 (i.e. local log odds ratios are all positive).

To know more about the characteristics of marriage is important for several reasons. One, that is particularly important, is the link between household formation and the optimal tax scheme. For example, if one wants to consider taxation for couples, assortative mating, ^{Becker (1973}, ^{1974}), suggests that the partners’ types are dependent and it is hard to justify the claim that the government knows the joint density of these types. In an important paper ^{Kleven, Kreiner, and Saez (2009}) consider optimal income taxation for couples and show that when the types are independent (when low and high type females are equally likely to marry high and low type males) the optimal taxation should exhibit negative jointness^{4}: the marginal tax rate for one spouse should go down when the income of the other spouse goes up. ^{Frankel (2014}) allows assortative mating with discrete types and finds that even though the optimal taxation should have negative jointness when one’s type is uncorrelated to his or her spouse’s, as the level of assortative mating increases negative jointness attenuates and for a large enough correlation between types, the negative jointness disappears entirely. So in an economy where couples have highly correlated types, the government should use a separable tax code. ^{Chade and Ventura (2002}) find that tax reforms can have substantial effects on female labor supply and on the degree of assortative mating. This is particularly relevant because different countries around the world use different taxation rules. For instance in the U.S couples are treated as a single individual for tax purpose while in Argentina they file separately.

Another important reason to learn more about marriage is its connection to income distribution. There are numerous papers that have tried to explain the distribution of income and its determinants for Argentina (see ^{Cruces and Gasparini, 2010}). Even though there is no much empirical work that exploits the linkage between inequality and assortative mating. Another channel has to do with the effects of education, in particular the distributive effects of marriages positively sorted along the couples education. Since the education premium is positive, ^{Card (2001}), and growing over time, ^{Katz and Autor (1999}); ^{Acemoglu (2002}) and ^{Kaymak (2009}), assortative mating will be associated with increasing income inequality, this link between assortative mating and (high) income inequality is in fact causal.

Finally, to explore assortative mating is relevant to study demographics possible explanations for the way households are formed. By this we mean the way marriages constitute.

The remaining of the paper is structured as follows. Section 2 describes the sample and provides summary statistics. Section 3 contains the empirical strategy we use in our investigation and the results. In particular, we present the regression approach, the tests for positive assortative mating, and we discuss our findings. Finally, section 4 concludes and indicates future lines of research.

2. DATA

We use data for Great Buenos Aires (GBA) from the National Permanent Household Survey (EPH) of Argentina for the years 1980-2014. We restrict our attention to GBA since we have complete data, especially, for the early years of our sample, which is not the case for other urban areas in those years. The EPH contains detailed individual and household level information such as gender, marital status, education, wages and (total) family income.

In order to study the marriage market between 1980 and 2014 we decided to keep in our sample only the years 1980, 1986, 1992, 1998, 2004, 2010, and 2014^{5}. Since we are interested in marriages we restrict our attention to the subsample pertaining to couples (either legally married or not). Therefore, we do not, and cannot, explain why some stay single or why some couples divorce. We further restrict ourselves to the observations where both individuals (in a household) are between the age of 25 and 60, and at least one of them has positive labor income. This leaves us with 1,600 observations on average per year where the largest number (2,154) corresponds to 2014 and the smallest number (1,231) is for 1998. Ideally we would prefer a narrower age gap for the couples, but this implies to work with significantly smaller samples. Since our main exercises are based on non-parametric techniques, we chose to use a wider age gap between couples in order to avoid deteriorating the precision of our estimates based on small sample size issues.

As can be seen, in Table 1 the average years of schooling for a wife has increased from 7.76 years (completion of elementary school) in 1980 to almost 12 years (completion of high school) in 2014. Likewise there is an increase in the average number of school years for the husband from 8.27 in 1980 to 11.11 in 2014, but of a slightly smaller magnitude. In particular, between 2004 and 2014, the average years of wife’s schooling overtook that of husband’s. We also see a change in the gender of household head -the proportion of households with a male head has decreased throughout the years. While 99% of the households had male as heads in 1980 and 1986, it has declined to 85% in 2014. In line with this, there has been a significantly increase in female labor force participation over the years as well as an increase in her household income share (roughly doubled between 1980 and 2014).

To explore more about the level of education of married couples we divide the sample according to six levels of education: incomplete elementary school (IES); complete elementary school (CES); incomplete high school (IHS); complete high school (CHS); incomplete college degree (IC); and complete college degree or more (CC+). Table 2 shows the percentage of couples with a certain level of education in 1980 and 2014. For instance, 15.01%, 23%, 3.47%, 4.33%, 1.18%, and 1.44% of couples in 1980 were such that both partners had IES, CES, IHS, CHS, IC, and CC+ levels of education, respectively. As can be seen, for both 1980 and 2014, the diagonal entries (in bold) are biggest along any row or column, which is suggestive of sorting along education. Although we do not present the data for other years, they all exhibit a similar pattern. We can also see that couples with both partners being more educated have increased through time. In particular the proportion of marriages with the highest level of education for both members has increased from 1.44% in 1980 to 13.77% in 2014.

Note: Each entry shows the percentage of total sample that corresponds to the education level of husband and wife. For instance, in 1980, if we look at wives with couples had Incomplete Elementary School (IES), 15.01% had husbands with IES, 9.44% with Complete Elementary School (CES), 1.83% with Incomplete High School (IHS), 0.13% with Complete High School (CHS), 0.07% with Incomplete College (IC), and 0.07% with Complete College or more (CC +).

3. EMPIRICAL STRATEGY AND RESULTS

In this section we explain the methodology and results from a series of exercises in order to test the positive -assortative- mating hypothesis that posits that marriages can be positively sorted along the couples’ education.

First, we adopt a parametric perspective by using a regression approach to assess assortative mating by controlling for observed and unobserved characteristics. Then, in order to provide more evidence about assortative mating, we use a set of nonparametric tests. In particular, we empirically verify if the observed matching distribution is super-modular, i.e., the local log odds ratios of education are all positive.

3.1. Testing Assortative Mating

**Regression approach**. As a first (parametric) exercise to assess assortative mating, we regress wife’s education on husband’s education, while controlling for other covariates. In particular, we use the following specification:

where the subscript *m* denotes the _{
mth
} couple, and the superscripts *w* and *h* denote wife and husband, respectively. That is, *E*
^{w}
_{m} is the number of years of education of the wife when she belongs to a couple indexed by *m,* and likewise *E*
^{h}
_{m} is her husband’s years of education. In addition, we control for the age gap, the presence of children in the household, and the gender of the household head^{6}. We collect all these in the vector X_{m}^{7}.

Our parameter of interest is θ; where θ measures the degree of assortative mating. ^{Greenwood, Guner, Kocharkov, and Santos (2014}) also use a similar strategy but they ignore the possibility that the education of the husband might be endogenous. In the literature, there are a large number of findings that spread doubt about how exogenous is the husband education variable. For example, wife family and social background could affect wife’s education and her choice of a more educated partner. Also, the wife’s preference about education could have an effect on the husband education. Wives that care more about education may select more educated husbands as their partners or encourage their husbands to study. On the other hand, both husbands and wives may take into account how their pre-marital education decisions affect their marital (unobserved) power bargaining.

To correct for this endogeneity we propose to use an IV procedure to estimate equation (1), but finding a valid instrument is not quite simple. We need an instrument strongly correlated with the endogenous variable (i.e. husband education) and that satisfies the exclusion restriction (i.e. no effect on wife education). It has been known for a while now that income is highly correlated with education, which means that husband’s income will be highly correlated with his education, so we consider the husband’s log-income as an instrumental variable. In Table 3 we present the Spearman’s (rank) correlation where we observe that the correlation between husband log-income and husband education is different from zero. As we expect, the correlation coefficient is positive and it is around 0.50. Also, more educated husbands show higher correlation coefficients. These features suggest that this variable could be used as an instrument.

To further assess the validity of our instrument, we also estimate the model using 3 subsamples: one where the wife has less than high school, another one in which she has less than college (including high school), and finally one in which she attains the highest level of education (more than college). As shown in Tables A1, A2 and A3 in Appendix the identification power of the instrument relies heavily on the variation in husbands’ education and income for those wives with lower level of education (first stage F-tests larger than 10).

Also, a valid instrument needs to satisfy the exclusion restriction. For husband’s income to be a good instrument, it should only affect husband’s education but not the wife’s education. The exclusion restriction relies on the timing of the decision to go to school and marry, because (in most likelihood) acquiring education comes before the decision to marry. We have in mind a situation where the schooling years is decided by the wife (when she is a maiden), possibly in consultation with her maiden family, and not in conjunction with her future husband. This is consistent with the theory of assortative mating where the decision to marry someone is based solely on education, i.e., it is only after one’s education is over does one marry. Since we do not model the dynamic decision of education choice and mating, we cannot directly test the validity of this exclusion restriction. The only caveat seems to be those who get higher education. It is possible, for instance, that in our data the husband’s income influences the marginal decision (intensive margin) to get an advanced degree, such as an MBA or M.D.

*Results.*Table 4 presents the regressions estimates using the entire sample. Tables A1, A2, and A3 in Appendix A1 provide the regression estimates for different educational groups (i.e. varying sample sizes). In all cases the effect of husband education on wife’s education is positive. For the entire sample, the husband education coefficient *(*θ^*)* estimated by OLS ranges between 0.585 and 0.722 and the coefficients estimated by IV ranges between 0.805 to 1.038, so they are higher than the OLS estimates. The coefficients are statistically significant. We thus find evidence supporting a pattern of assortative couple formation along education. However, we do not find strong evidence in favor of an increasing pattern throughout the years since the θ^*' s* do not increase steadily through time.

We also present, as a robustness check, the results obtained using data for couples between 25 and 40 years old which can be considered more homogeneous (in terms of marriage spell). These results are reported in Table 5 and are qualitatively similar to the ones found for the entire sample. The IV estimates are positive and significant. By comparing IV with OLS estimates, we also find that the pattern of assortative mating is stronger in the former case throughout all the years of the sample.

**Local odds ratio approach**. From the parametric approach, we find evidence supporting assortative mating. In this section, we test different models of assortative mating and we see which of them better describe our data. In particular, we are interested in testing Becker’s theory of Perfect Positive Assortative Matching (PAM).

From ^{Becker (1973}, ^{1974}) we know that in a static model with transferable utilities where the match output function is super-modular with respect to agents’ ability (e.g. education), there is PAM in equilibrium. Moreover, PAM is independent of the population distribution of wives and husbands, and of the number of categories (levels of education) considered. Also, PAM can be assessed by looking at a subset of the population, and imposes no restrictions on the unmatched. In a recent paper, ^{Siow (2015}) developed a stochastic version of the Becker model with the same predictions as the original one but also with more powerful statistical tools to test PAM, than simple correlation tests. Also, he indicates how to empirically differentiate between PAM and preferences for own type by using the concept of supermodularity of the marital output function. In this set of exercises we closely follow ^{Siow (2015)}.

To describe some patterns of the marriage market, let μ*(i,j)* be the number of couples in which the husband has achieved education level *i* and the wife has education level j, where *i,j* ∈ *{1ES,CES,1HS,CHS,1C,CC+}.* Then, we can define a 6x6 matrix μ, known as the equilibrium matching distribution, which has μ*(i,j)* as a typical element (see Table 2). A local measure of association in μ can be computed using local log odds ratios, where the {i,j} local log odds ratio is defined as

As emphasized by ^{Siow (2015}), there is no loss of information in considering local log odds ratios rather than p. We calculate a 5x5 matrix of local log odds ratios, one for each year. For 1980 and 2014 these ratios are reported in Table 6, the remaining years are in Appendix A1. Each entry denotes the estimated log odds ratios for each pair of education level, with bootstrapped standard errors (1000 replications) in parenthesis.

Note: Bootstrap standard errors (1000 replications) in parenthesis. (***) denotes p-value <0.01, (**) denotes p-value <0.05, (*) denotes p<0.1. The entry ‘n.a’ refers to cases with insufficient observations.

Our running hypothesis is that if marriage was sorted along education, then we would expect log odds ratios greater than zero (i.e. *l(i,j) >* 0) when the level of education is the same for both partners. On the other hand, if marriage was (uniformly) random then the ratios would be equal to zero.

We indeed find that the (unrestricted) log odds are all greater than zero, for both the years, along the main diagonal. In 1980 there are 11 significantly different from zero log odds ratios, 5 of which are along the main diagonal. However, 3 off-diagonal log odds ratios are negative. In 2014 these patterns are repeated with 11 local odds ratios being significantly different from zero. Along the main diagonal we again obtain positive ratios and in the off-diagonal positions we have 2 negative ratios. This is preliminary evidence suggesting that random matching is not the pattern describing couple formation.

We, now, explore formally if the sorting pattern found in Table 2 is strong enough to suggest assortative mating based on education. We compare different models of marriage matching to see which one better describes our data. We start by providing the following definitions.

**Definition of TP2:** μ is Totally Positive of Order 2 if:

**Definition of DP2:** The *I* x *I* matrix μ is Diagonal Positive of Order 2 if:

**Definition of DPNE:** The *I* x *I* matrix μ is Diagonal Positive and Negative Elsewhere if:

**Definition of DP0E:** The *I* x *I* matrix μ is Diagonal Positive and Zero Elsewhere if:

TP2 is a common strong measure of positive assortative matching (PAM) used in the Statistical Literature (^{Douglas et al. (1991}); ^{Shaked and Shanthikumar (2007})). As can be seen, TP2 is stronger than DP2. Another interesting pattern of matching is DPNE where the log odds ratios along the main diagonal are positive and negative elsewhere. Finally, DP0E assumes random matching for the off diagonal couples.

DP2, DPNE, and DP0E can be rationalized by a model of preference for own type (for details see ^{Siow (2015})). This is important since it is possible to have preference for own type but not PAM in the data. One way to model preference for own type is by means of a penalty function that reflects that marital output is higher the more similar the spouses are. In other words, preference for own type basically imposes restrictions on the log odds ratios along the main diagonal (they should be positive). The difference between TP2 and these models (DP2, DPNE, and DP0E) is the maintained assumption about the “non-similar” matches (off-diagonal positions). For the DPNE model, one is assuming that there is no complementarity in terms of marital output, for couples located in the off-diagonal positions, while in the DP0N model one assumes random matching outside the main diagonal. DP2 does not restrict off-diagonal log odds.

We compare every model described above with an unrestricted model, i.e. the one that does not impose any kind of restriction on the sing of the log odd ratios. For example, imposing non-negative diagonal terms delivers the (restricted) DP2 model. If we further impose non-negative off-diagonal elements we obtain the (restricted) TP2 model. If the diagonal terms are restricted to be non-negative but the off-diagonal ones are negative, we end up with the DPNE model. Finally, when the off diagonal is restricted to be zero while the diagonal positions are non-negative, we have the DP0E model. We compare every case by means of a Log-Likelihood Ratio (LR) statistic and a Mean Relative Error (MRE) test.

*Tests.* Let *N* be the sample size and _{
nij
} the observed number of marriages in which the husband has education level *i* and the wife has education level *j*. We assume that each marriage follows a multinomial distribution with parameter. Then, the unrestricted model is

The different restricted models correspond to TP2, DP2, DPNE, and DP0E. For the TP2 model we add the restrictions that all log-odd ratios are nonnegative (i.e. *l(i,j)* ≥ 0). For the DP2 model the restrictions are only for the diagonal terms, *l(i,i)* ≥ 0. And finally for the DPNE and DP0E models there are two kinds of restrictions to consider. The first ones are the same as in the DP2 case and the second ones impose that the off-diagonal log-odds ratios are all negative (l(i,j) < *0,i* ≠ j) for the DPNE model and zero for the DP0E model (l(i,j) = *0,I* ≠ j).

It is well known that maximizing a log-likelihood could be difficult when the problem involves a substantial number of restrictions as in our case. In practice, one solution is to re-write problem (2) as a geometric programming problem which involves a minimization (see ^{Lim, Wang, and Choi (2009}) and ^{Boyd, Kim, Vandenberghe, and Hassibi (2007})). For the TP2 model the restricted geometric programming problem is

Once this problem is solved we obtained the optimal values, μ_{
r (i, j
} ), which can be used to compute the maximized log-likelihood value (L_{r})^{8}. Similarly we solve for the DP2, DPNE, and DP0E. Then, we use a LR test defined by

where _{
Lu
} is the maximized log-likelihood for the unrestricted model and _{
Lr
} corresponds to the restricted version.

For the test, the restricted model is the one under the null hypothesis while the unrestricted model is under the alternative. To assess the results we use parametric bootstrap to obtain the corresponding p-values^{9}. Since in large samples the power of this test is close to one, we also conduct a second test not sensitive to sample size. That is we perform a MRE (Mean Relative Error) test defined by

where μ(*i, j*) are the ones obtained from the data and μ*r (i,* j), are the solutions to the linear programming problem of each restricted model. Note that MRE is zero if the restricted model fits the data perfectly.

*Results.* The following table reports the results of the tests described above.

The tests for the year 1980 show that the model that better describes our data is DP2. That is, the LR statistic for this model is 0.0000 and the p-value is 0.2140. For TP2 the LR statistic is 15.598 with a p-value of 0.023. Thus the minimum level at which we do not reject the null hypothesis of TP2 against the unrestricted model is just 2.3%. DPNE and DP0E are rejected at all conventional levels. Therefore we can conclude that there is no evidence against TP2 and DP2.

To summarize, the restricted model that is more consistent with the data is DP2, i.e., the one that imposes restrictions only on the diagonal log odds ratios. The fact that positive diagonal local log odds are supported by our data indicates that there is homogamy in the marriage market but not a pattern of PAM. As mentioned above these results can be rationalized through a model of preference for own type with a marital penalty function (for further details see Proposition 4 in ^{Siow (2015})).

4. CONCLUSION

In this paper we investigate important aspects about couple formation in Argentina. We test if marriage is consistent with positive assortative mating along education, like in other countries. In particular, we use formal testing procedures to rigorously assess for TP2 or other forms of matching. For this purpose we estimate different behavioral models using the local odds ratio approach. We also conduct several regressions taking into account possible endogeneity problems. To this end we perform IV regressions using the log-income of the husband as an instrument. Evidence supporting assortative mating based on education is found by both, the non-parametric and the parametric methods explained above. Nevertheless, we do not find a clear pattern of PAM, that is a strong measure of assortative mating.

An important issue to investigate more deeply is the link between assortative mating and the optimal tax scheme. To pursue this goal it is of first importance to have accurate tax data at the individual level. Therefore one needs to construct a suitable data set from different sources, which requires a significant amount of effort and time. Moreover, even with the data at hand, it is necessary to construct a model for the marriage market where males and females are heterogeneous with respect to their type/earning potential and there is a centralized matching with search friction as in ^{Shimer and Smith (2000}). In this kind of settings, matches take place at different markets where an individual observes the type of all potential partners. In the implementation of this model we can proxy the market by restricting our attention at a city level or county level, to capture the idea that people who live close by are more likely to marry. After marriage, two partners form a household and they choose their labor and consumption, where the total income depends on their types. This approach ties the marriage market and the family economy together, both of which have been studied separately but not together. This generalizes the current literature where either the payoffs are treated exogenous as in the marriage market or where the effect of bargaining on marriage market is ignored. We leave this for future research.