pull down to refresh

Research in Public #03: TWFE regression of # posts on territory fees

Introduction

In #1243188 I suggested to @k00b that we engage in a research project using SN data. The idea would be to use this data to study: A) how micropayments with real money affects internet discourse; and B) barriers to the adoption of self-custody. I also promised @Undisciplined that I'd carry out the research in public, since many people might not know what economics research looks like, and may be curious as to how the process plays out. You can follow all of the updates here.

Recap from last time

Last time (#1251535), I showed using a scatter plot that the quantity of posts in a territory was higher during periods in which the territory's post fees was lower. This was taken as suggestive evidence that users are responsive to territory post fees.
However, the evidence is not conclusive. There are many reasons why the correlation could be spurious. For example, if post fees were lower before SN went non-custodial and activity was higher, and if post fees were higher after SN went non-custodial and activity was lower, then the correlation between post fees and number of posts is actually spurious---driven by a third factor: SN's decision to go non-custodial.
For another example, what if the more popular territories simply had lower post fees for reasons unrelated the high number of posts? For example, Bitcoin, tech, and economics are all popular subjects on SN. What if those territory founders set low post fees simply out of their generosity, but the popularity is due to the subject matter and not the low post fees?
Whenever an alterantive explanation exists to explain the correlation between two variables, we say that there is endogeneity in the correlation. A major aspect of the art of econometrics is figuring out how to deal with endogeneity.

Two-way fixed effects regressions

To deal with the endogeneity between post fees and number of posts, I use a two-way fixed effect regression. The model is this:
\log Y_{it} = \beta \log X_{it} + \mu_i + \delta_t + \epsilon_{it}
where i indexes a territory and t indexes a week. Y_{it} is the number of posts in territory i in week t, and X_{it} is the posting fee of territory i in week t.
\mu_i is a vector of fixed effects (or dummy variables) for each territory i. It captures a baseline level of popularity for each territory. This helps us deal with endogeneity because it allows for arbitrary levels of popularity of each territory's subject matter, arbitrary levels of generosity of each territory's founder, etc. Basically, it allows us to capture any baseline level differences in the number of posts of each territory.
\delta_t is a vector of fixed effects (or dummy variable) for each week t. It captures an arbitrarily flexible time trend in Stacker News's overall popularity. This helps us deal with things changing over time, such as SN's decision to go non-custodial, or other side-wide factors that could influence the number of posts.
With these two fixed effect vectors, we control for any baseline differences in popularity between territories, and we control for any common time trends in SN's overall popularity. The regression is identifying the relationship between post fees and number of posts based on changes in the posting fee within territories. So, the thought experiment is: "When a territory changes its posting fee by dX amount, how much on average does the number of posts change in that territory?"
The two-way fixed effect regression is closely related to the difference-in-differences methodology, if anyone is familiar with that.

Additional data work

Just a quick note of additional data work. Before running the regression, I had to make sure I knew when each territory was actually active or not. To do that, I had to look at the historical paid and unpaid billing cycles of each territory and back out when a territory was active vs. archived. Of cousre, if a territory was archived on a given week, we would not include that week in the regression.

Results

Here is a table showing the estimate for \beta, the relationship between log-posting-fee and log-posts:
CoefficientStandard Errort-valuep-value
-0.246^{***}0.011-22.95<0.01
This suggests suggests that if you double your posting fee you reduce your weekly number of posts by about 16%. If you 10x your posting fee, you reduce weekly posts by about 43%.
The fixed effects are interesting to plot as well. The chart below shows the estimated fixed effects for territories:
~bitcoin is (unsurprisingly) the most popular territory, averaging 50x the number of posts as the average territory. The second most popular territory measured this way is ~tech. ~tech hasn't seemed to be as popular recently, with the resurgence of the ~ai territory, but remember that these results are for SN's entire history. The third most popular territory is ~news, and so on.
This chart shows the estimated fixed effects for weeks:
As expected, the chart shows steady growth in weekly posts from SN's inception in June 2021, peaking in late 2023/early 2024, being stable for most of 2024, and then starting to decline a bit in late 2024/early 2025. This is consistent with other charts and with our experience.

Next steps

Today's result adds stronger evidence that users are responsive to posting fees. Again, this is both surprising and not surprising. It's not surprising because it coheres with economic theory: demand curve slopes downwards. It's surprising because it matters even for such small micropayments. It does suggest that people respond to micro-incentives and is promising for the overall path of this project.
As a next step, it'll be interesting to see if post quality is responsive to posting fee. Do territories that have higher posting fees screen out bad posts and attract good ones? If you set it too high, do you both screen out bad posts but also discourage good posts? That's probably the next set of results to try and look at.
Anyway that's all I have for today. Anyone who wants to vet the code can go to https://github.com/ed-kung/sn-research. I'll keep posting any time I spend a day doing substantial work on this project.
How much has the number of territories fluctuated? It might be worth including something simple, like the count of territories, to capture changes in available substitutes. I'm sure there's a much smarter way to do that that actually accounts for which territories are substitutes for each other, but I'm not an IO guy.
Also, a note for those following along: An important assumption of this model is that all territories have the same relationship between posting fees and quantity of posts. It's possible to estimate different coefficients for each territory, but that runs into limits on statistical power pretty quickly. Still, it might be interesting to explore that for the most popular territories.
reply
Any linear effect of the number of available territories should be captured in the week fixed effects. Of course, there could be nonlinear relationships that aren't modeled here.
More to the point, at some point it may be worth writing down what we think the structural model of user behavior is.
reply
Good point. I was imagining including the log of the count, but a lot of that variation would still be absorbed by the fixed effects, especially if there are always about the same number of territories.
The more relevant number of substitutes per territory isn't captured by the fixed effect, but we also don't know it.
That model of user behavior will definitely have expected returns in it, or we'll both lose our licenses to practice economics.
reply
It might be fun to try and write down a structural model that captures every incentive.
Posters are motivated by i) expected earnings on zaps, ii) post cost, iii) inherent desire to post the content. They generate post ideas at random of varying quality, then decide whether or not to post, and which territory to post in.
Territory owners are motivated by i) expected earnings on fees, ii) billing costs, iii) inherent desire to see certain content. Each month they decide whether or not to pay the billing (we'll abstract from the options for yearly/perpetual payment)
Zappers are motivated by i) cost of zaps, ii) inherent desire to see certain content.
For now, we'll make it all separable, so a person's decision to post, run a territory, and zap are all independent of each other. I think it'd be way too hard to model a single entity with a three-fold role of poster, territory owner, and zapper.
reply
30 sats \ 0 replies \ @carter 2h
chart checks out
reply