Research in Public #03: TWFE regression of # posts on territory fees
- Introduction
- Recap from last time
- Two-way fixed effects regressions
- Additional data work
- Results
- Next steps
Introduction
In #1243188 I suggested to @k00b that we engage in a research project using SN data. The idea would be to use this data to study: A) how micropayments with real money affects internet discourse; and B) barriers to the adoption of self-custody. I also promised @Undisciplined that I'd carry out the research in public, since many people might not know what economics research looks like, and may be curious as to how the process plays out. You can follow all of the updates here.
Recap from last time
Last time (#1251535), I showed using a scatter plot that the quantity of posts in a territory was higher during periods in which the territory's post fees was lower. This was taken as suggestive evidence that users are responsive to territory post fees.
However, the evidence is not conclusive. There are many reasons why the correlation could be spurious. For example, if post fees were lower before SN went non-custodial and activity was higher, and if post fees were higher after SN went non-custodial and activity was lower, then the correlation between post fees and number of posts is actually spurious---driven by a third factor: SN's decision to go non-custodial.
For another example, what if the more popular territories simply had lower post fees for reasons unrelated the high number of posts? For example, Bitcoin, tech, and economics are all popular subjects on SN. What if those territory founders set low post fees simply out of their generosity, but the popularity is due to the subject matter and not the low post fees?
Whenever an alterantive explanation exists to explain the correlation between two variables, we say that there is endogeneity in the correlation. A major aspect of the art of econometrics is figuring out how to deal with endogeneity.
Two-way fixed effects regressions
To deal with the endogeneity between post fees and number of posts, I use a two-way fixed effect regression. The model is this:
\log Y_{it} = \beta \log X_{it} + \mu_i + \delta_t + \epsilon_{it}
where
i
indexes a territory and t
indexes a week. Y_{it}
is the number of posts in territory i
in week t
, and X_{it}
is the posting fee of territory i
in week t
.\mu_i
is a vector of fixed effects (or dummy variables) for each territory i
. It captures a baseline level of popularity for each territory. This helps us deal with endogeneity because it allows for arbitrary levels of popularity of each territory's subject matter, arbitrary levels of generosity of each territory's founder, etc. Basically, it allows us to capture any baseline level differences in the number of posts of each territory.\delta_t
is a vector of fixed effects (or dummy variable) for each week t
. It captures an arbitrarily flexible time trend in Stacker News's overall popularity. This helps us deal with things changing over time, such as SN's decision to go non-custodial, or other side-wide factors that could influence the number of posts.With these two fixed effect vectors, we control for any baseline differences in popularity between territories, and we control for any common time trends in SN's overall popularity. The regression is identifying the relationship between post fees and number of posts based on changes in the posting fee within territories. So, the thought experiment is: "When a territory changes its posting fee by
dX
amount, how much on average does the number of posts change in that territory?"The two-way fixed effect regression is closely related to the difference-in-differences methodology, if anyone is familiar with that.
Additional data work
Just a quick note of additional data work. Before running the regression, I had to make sure I knew when each territory was actually active or not. To do that, I had to look at the historical paid and unpaid billing cycles of each territory and back out when a territory was active vs. archived. Of cousre, if a territory was archived on a given week, we would not include that week in the regression.
Results
Here is a table showing the estimate for
\beta
, the relationship between log-posting-fee and log-posts:Coefficient | Standard Error | t-value | p-value |
---|---|---|---|
-0.246^{***} | 0.011 | -22.95 | <0.01 |
This suggests suggests that if you double your posting fee you reduce your weekly number of posts by about 16%. If you 10x your posting fee, you reduce weekly posts by about 43%.
The fixed effects are interesting to plot as well. The chart below shows the estimated fixed effects for territories:
~bitcoin is (unsurprisingly) the most popular territory, averaging 50x the number of posts as the average territory. The second most popular territory measured this way is ~tech. ~tech hasn't seemed to be as popular recently, with the resurgence of the ~ai territory, but remember that these results are for SN's entire history. The third most popular territory is ~news, and so on.
This chart shows the estimated fixed effects for weeks:
As expected, the chart shows steady growth in weekly posts from SN's inception in June 2021, peaking in late 2023/early 2024, being stable for most of 2024, and then starting to decline a bit in late 2024/early 2025. This is consistent with other charts and with our experience.
Next steps
Today's result adds stronger evidence that users are responsive to posting fees. Again, this is both surprising and not surprising. It's not surprising because it coheres with economic theory: demand curve slopes downwards. It's surprising because it matters even for such small micropayments. It does suggest that people respond to micro-incentives and is promising for the overall path of this project.
As a next step, it'll be interesting to see if post quality is responsive to posting fee. Do territories that have higher posting fees screen out bad posts and attract good ones? If you set it too high, do you both screen out bad posts but also discourage good posts? That's probably the next set of results to try and look at.
Anyway that's all I have for today. Anyone who wants to vet the code can go to https://github.com/ed-kung/sn-research. I'll keep posting any time I spend a day doing substantial work on this project.