Review of “Social Capital, Inequality, and Environment in Southeast Asian Cities.”

Amrita Daniere et al.

 

The authors of this paper are asking an important question: “how might public policy makers recognize, support, and enhance the existing and potential attributes of well-governed communities?”  They are also asking this question in a critical context: among urban slums in Asia, where public policy makers should be able, at the margin, to do a lot of good.  And the authors have chosen a sound research strategy for addressing this and related questions: to use the concept of social capital to explain why some communities are better able to solve their collective action and coproduction problems, and then to collect survey data to test how urban slum communities differ -- in demographics, social capital, health and environmental outcomes.  I am very sympathetic to this kind of work: it is an important area of research which deserves plenty of academic and policy attention.

 

Nevertheless, I can not recommend this article for EDCC.  While the research strategy is sound, the use of the survey data is very problematic, so the policy implications that flow from the data analysis – while likely – can not be justified from the analysis in the paper.  As I detail in the end of this review, I recommend that the authors adopt one of two alternative approaches to dramatically recast this paper.

 

The authors collected data from 1000 households in 10 communities in Bangkok and Ho Chi Minh.  This is an impressive feat in-and-of-itself, and the documentation of this process in the paper is quite good.  This rich data is then used to construct seven tables of community-level means, the last of which is a summary table of data that appeared in earlier tables.  In the heart of the paper, the analytical strategy of the authors is to (a) describe a variable of interest (b) summarize the means of the variable within and across the two cities; and (c) to make inferences from the perceived differences in  the means.  As the analysis continues, the authors then (d) make inferences about the perceived relationships between means of different variables.  So for example, in their most interesting finding, they show that in communities with high inequality, some  measures of social integration are low.  In their final section, they use the perceived differences in means and relationships between means to make a set of very specific policy recommendations.

 

This analytical strategy is highly problematic – and does not do justice to the excellent data that these authors have collected.  At face value, let’s assume that the interesting conclusions to be reached from these data actually do concern inter-community and inter-city comparisons.  In this case, one must use statistical tests (parametric or, in this case, probably non-parametric, like Mann-Whitney) to show that  the differences in means are statistically significant.  I find it shocking that not a single mention is made in this paper of statistical significance until page 18, where they use basic correlation tests.  (It is possible that every time that the authors use the word ‘significant’ that they mean ‘statistically significant.’  But this is unlikely.  And if indeed this is the case, they are completely obliged to report the nature of their tests, significance levels, etc.)

 

Of course, even well-done statistical tests like this would only get one so far, since untangling causality among variables related to health, the environment and social capital is tricky at best.  Well, in the social sciences, we have a techniques for this: regression analysis!  So, with communities (or their equivalent) as the unit of analysis, this is the approach of folks like Alesina and Ferreira (2000): what determines outcomes in community j, where the regressors are community-level means. 

 

The authors of this paper, of course, can not avail themselves of regression analysis where the community is the unit of analysis, since they only have 10 communities.  But, they certainly can address the same set of interesting questions about social capital, inequality, and environment by using the household as the unit of analysis: what determines outcomes of household i in community j, where the regressors are related to household i as well as community-level means.  This would increase their N to 1000, and allow for some fascinating intra- and inter-community comparisons.  Indeed, it is this approach that is used successfully with similar survey data, in the two recent EDCC articles that these authors are partially emulating: Narayan and Pritchett (1999) and Isham and Kähkönen (2002).

 

Now, let me back up and give the authors a chance: perhaps one might say that in so many cases, the differences of means are so large that Mann-Whitney or other such tests are extraneous; and perhaps it is true that the associations  between the means of different variables are so compelling – and not plagued by the possibility of simultaneity or omitted variable bias -- that one can easily connect the dots between social capital and outcomes without advanced statistical tests.

 

Let’s take each of these possibilities in turn.  On page 12, the authors state that ‘there appear to be some differences between the two cities in terms of household composition …’ But the (unweighted) means across the two cities are not that different: 3.69 vs. 3.31.  If these differences are not statistically different, what is the reader to make of the comments that conclude this paragraph: “It is more likely that older children remain in the household longer in Ho Chi Minh City than in Bangkok.”?  Not only have they not made a convincing (statistical) case that there is any difference: how is the reader to assess whether this is ‘more likely’?  There are plenty of other explanations – relative cost of housing, relative wages, marriage traditions -- the list goes on.      

 

Unfortunately, this is the same type of reasoning that the authors too often use: the declaration of meaningful differences between two communities – or across the two countries – based on a small (untested) difference in means of a related variable, followed by an unsupported speculation to explain this difference. 

 

Yes, in some cases, the differences are clearly large – so Mann – Whitney is not really necessary -- but even in these cases, the inferences are often unjustified and/or unsupported.  For example: The most distrustful community is clearly Community A (Tan Dinh), which experiences numerous government planner visits” (p.  16).  The authors are implying that the government planner visits explain the lower mean of this variable compared to the other slums: but there are so many other possibilities to explain the common occurrence of these phenomena, one doesn’t know where to begin!  (the weakness of this approach is revealed in the concluding pages, where the authors often fall back on statements like ‘seems to be associative’ and ‘might be related’) 

 

Now, what about the possibility of an unambiguous association between certain variables?  Their best ‘result’ is the one, summarized on pp. 18 and 19,  that shows a statistically significant causal relationship between their Theil index and the social capital index.  They then state that ‘this is firm evidence that social capital and income distribution are related to each other.”  But that conclusion, while attractive, is simply not justified.  There are plenty of other variables that could affect social capital and income distribution – migration, ethnicity, education (again the list goes on) – so that the association could be completely spurious.  Take a longstanding example from the social sciences in the United States: many ‘scholars’ (recently, Charles Murray comes to mind) have shown  that indicators of race are associated with indicators of intelligence, so they reach conclusions like the authors of this paper: “this is firm evidence that race and intelligence are related to each other.”  But the relationship is, of course, completely spurious: consider income, education, job opportunities, etc.

 

Some additional notes are below:

 

 

In conclusion, I can not recommend that this paper be submitted to another top-tier journal in its current form. 

 

However, I believe that the authors have some great material here – and, personally, I believe in the validity of many of their conclusions.  So one of two major alterations is called for.  On the one hand, their paper can be transformed into a ‘qualitative piece,’ where the authors focus on the histories of the two countries and the ten slums to convincingly make their analytical case.  Under this approach, the means of the data would complement historical and case-study evidence, not be the heart and soul of the paper.  On the other hand, the paper could be substantially beefed up econometrically, and then the data would convincingly take the lead in the analytics.  (As the authors put it tm selves (p. 21 –22): “therefore, clearly more rigorous statistical work  … needs to be completed before we can make definitive statements.”)  Again, this was the (successful) strategy of the two EDCC pieces noted above.

 

I want to finish by noting that I believe that the authors are doing important work, so I encourage them to tell their story about social capital, inequality, and environment in a more convincing fashion – either with a richer qualitative or quantitative approach.