pull down to refresh

I was meaning to do an analysis from the open source perspective of America's AI Action Plan last Friday but irl took over and I didn't have enough quiet time left available to write it up. Better late than never, I hope.
Rationale for this article
From the introduction of the plan:
Winning the AI race will usher in a new golden age of human flourishing, economic competitiveness, and national security for the American people.
If it were a matter of winning and losing, what would such an outcome look like for the other 95% of the world that would then not-win? What does "losing the AI race" look like? Being even more enslaved to the whims of companies like Microsoft, that many, also in the US (#1049092), are getting tired of already?
I also wonder, is "winning" the AI race a long-term winning strategy for the US? Or is it more like the CIA making the Taliban "winning" in their quest against the Soviet invader? We can only guess at if there will be blowback from this, and what the imperial attitude will bring.
Then comes open source, or, in the case of current closest equivalent for AI models, open weights, where everyone can download a model and execute it on their own, sovereign, computer. 1. Open source doesn't really know winners or losers, but it does know progress. It is the complete opposite of winning the AI race, because the race continues in perpetuity.
In the AI space, some of the organizations that aren't market leaders in chatbots release their models as open weights, because that has both the most impact for further research, and it often does some disruption to closed and proprietary market leaders that these organizations aim to keep up with. Meta has done max damage this way with llama, and DeepSeek did the same with their R1 release. Many gladly ride these selfless-only-in-appearance donations to the Open Source community as it allows those of us that do not have those endless, sugar-sweet VC fiat injections, to build things with reasonably modern tech that would otherwise be proprietary and unobtainable except through expensive subscriptions.
Is America's AI Action Plan compatible with Open Source progess? Let's find out:

Pillar I: Accelerate AI Innovation

The first pillar consists of several proposed actions around what the US Government should encourage. Open source is explicitly mentioned here.

The Open Source mention

Open-source and open-weight AI models are made freely available by developers for anyone in the world to download and modify. Models distributed this way have unique value for innovation because startups can use them flexibly without being dependent on a closed model provider. They also benefit commercial and government adoption of AI because many businesses and governments have sensitive data that they cannot send to closed model vendors. And they are essential for academic research, which often relies on access to the weights and training data of a model to perform scientifically rigorous experiments.
This sounds pretty good, although it looks to me that it's clearly written by people stuck in government and Silicon Valley 2, but let's ignore that for a moment, because there is something else here:
We need to ensure America has leading open models founded on American values.
Prior instances that reference "American values" are centered around freedom of speech, which seems like a good goal because censorship can impede cognitive excellence. Let's indeed not build bias into models; perhaps uncensored models are a great open-source answer to these values already 3. Since this already exists, we can consider this problem solved, also for unrelated listed actions like:
Update Federal procurement guidelines to ensure that the government only contracts with frontier large language model (LLM) developers who ensure that their systems are objective and free from top-down ideological bias.
Perhaps the USG should just use Dolphin-Mistral: a well performing open model and free to use! I'm sure NVidia would gladly provide preferential pricing for GPUs and you don't even need huge ones to run the 24B model. Since governments work for the public, it is only beneficial to leverage open source and not proprietary products. Vendor lock-in gets avoided because these open models can be exchanged as they can share a common execution environment.

Adoption

Adoption is also mentioned, and from an open source perspective, this could perhaps help funding open model efforts, if, but only if, these efforts aren't immediately captured by proprietary players using their proprietary models, which I expect to be more likely to happen than not:
A coordinated Federal effort would be beneficial in establishing a dynamic, “try-first” culture for AI across American industry:
  • Regulatory sandboxes
  • Domain-specific efforts in healthcare, energy, and agriculture
AI-integrated manufacturing is also mentioned, mainly suggesting to use the Federal government's purchasing power as a lever to embed the industrial AI/robotics cross-section firmly in the manufacturing sector.

Recognition of GIGO

There's a section on garbage-in-garbage-out called "Build World-Class Scientific Datasets".
Direct the National Science and Technology Council (NSTC) Machine Learning and AI Subcommittee to make recommendations on minimum data quality standards for the use of biological, materials science, chemical, physical, and other scientific data modalities in AI model training.
This sounds nice, but once more, if this is gatekept there will be a good chance that open source model engineering will have reduced exposure to this effort. I hope that, in the spirit of the earlier open source section, this will be publicly available.
A framework for assessing AI quality is recommended too, which, if public, could help show the merits of open source, so ultimately I think that the whole of this could result in positive impact for open source AI.

National security

There are lots of items about national security, and singling out of Chinese-made models. This is the worrying part, because open source is agnostic to location (your server may not be, but this is easily circumvented because open licenses often allow re-packaging and redistribution.) I worry that these considerations could take the upper hand, and if a choice needs to be made between opening up models and boosting a commercial provider that can be strictly regulated, the latter may fit the national security portion of the bill better.

Pillar II: Build American AI Infrastructure

While most of this pillar focuses on building datacenters and deregulating to reduce some environmental friction (in regulation, not irl, of course) there is also a section about secure-by-design AI.
AI systems are susceptible to some classes of adversarial inputs (e.g., data poisoning and privacy attacks), which puts their performance at risk. The U.S. government has a responsibility to ensure the AI systems it relies on—particularly for national security applications—are protected against spurious or malicious inputs. While much work has been done to advance the field of AI Assurance, promoting resilient and secure AI development and deployment should be a core activity of the U.S. government.
This kind of links to the GIGO topic from Pillar 1 as well. It would be good to have resilient AI models and surrounding tooling implementations, also on the open source side of things, or maybe, especially there.

Pillar III: Lead in International AI Diplomacy and Security

The "diplomacy" part of this pillar feels a bit misleading, because it's mostly about protective measures, such as increased export control on hardware, counter-influencing international communities, surveillance and another round of national security concerns.

Conclusion

While open source is an explicit topic in the plan, there are many other actions outlined that are of questionable usefulness, if not outright incompatible with open source principles and mechanics, such as export controls and surveillance. Because of this, from an open source perspective, the plan has some internal conflicts for which we will have to wait and see what gains the upper hand: open collaboration, or protectionism.
Perhaps through some of the mentioned initiatives, such as curating quality model source data, regulatory sandboxes and maybe opportunities to build specific solutions for targeted industries, the open source community will perhaps get opportunities to showcase the merit of open collaboration.
We'll have to keep building, which can be done with encouragement, or despite discouragement from the USG.

Footnotes

  1. One of my personal favorite devices to run small LLMs and other transformer-based models on is my "old" Macbook Pro with an M1 chip. Any Apple M1/M2 device, due to errors in the physical chip architecture, cannot serve as a super-secure workspace anymore, but it has similar unified memory architecture to more recent M4 chips and also have an NLP chip built-in, and thus are fine for activities that don't require my pgp signature or secure compile environment. Providing compute for small LLM models works okay, though it's a bit slow at times.
  2. It doesn't mention individuals at all, but I'd pose that individuals have even more sensitive, private, data that they shouldn't under any circumstance share with a personal-data harvesting company like Google, Meta or OpenAI.
  3. It's also much less offensive if your own computer makes a cognitive error than when xAI's $120/mo chatbot insults you. However, I've not found much issues with uncensored models that had much/all of their "alignment" removed, like dolphin-mistral. These don't insult you out of the blue, unless you ask or manipulate them to generate answers about things that insult you.
116 sats \ 3 replies \ @SatsMate 2h
Pretty interesting! I think open sourcing all AI's should be key. It is kind of scary what this closed source software companies could be doing with our information that we are feeding the AI.
reply
According to Proton's comparison 1 (do take it with at least 1 grain of salt), no good things:

Footnotes

reply
21 sats \ 1 reply \ @SatsMate 1h
Cool, I may actually give Proton a try, definitely looks like a more friendly company from a privacy/user rights standpoint
reply
It's still better to run your own, but if you don't have acceptable hardware for it, this could at least help in the meantime. My initial questioning of it didn't feel like it was close to the performance I see on some of the (even smaller) latest generation open source LLMs.
reply