AI needs Web3
AI requires Web3.
Author: cointime
Until recently, startups have been leading the way in technological innovation, as they have the speed, agility, entrepreneurial culture, and freedom from organizational inertia. However, in the rapidly growing age of AI, things have changed. So far, large tech giants such as Microsoft’s OpenAI, Nvidia, Google, and even Meta have been leading the way in groundbreaking AI products.
So what went wrong? Why did this time the “Goliaths” outperform the “Davids”? Although startups can write great code, they often cannot compete with large tech giants due to several challenges:
1. The cost of computation is still extremely high;
2. AI has a problem called “reverse salient”: without necessary regulatory measures, concerns and uncertainties about the social impact will hinder innovation;
- Ongoing protests: Reddark
- Describe how Osmosis ProtoRev captures the MEV revenue owned by the...
- Overview of Cashmere Labs Blueprint
3. AI is a black box;
4. The data chasm of already extended participants (large tech companies) sets up barriers to entry for emerging competitors.
So what is the relationship between blockchain technology and where it intersects with artificial intelligence? Although not omnipotent, in Web3, DePIN (decentralized physical infrastructure network) can enhance AI technology by solving the above challenges. In this article, I will explain how to use the technology behind DePIN to enhance artificial intelligence from four dimensions:
1. Reduce infrastructure costs;
2. Verify author identity and humanity;
3. Inject democracy and transparency into AI;
4. Install incentive mechanisms for data contributions.
In the context of this article,
1. “Web3” is defined as the next generation of the Internet, of which blockchain technology is an important part, as well as other existing technologies;
2. “Blockchain” refers to decentralized and distributed ledger technology;
3. “Cryptocurrency” refers to using tokens as a reward and decentralization mechanism.
First, lower infrastructure costs (computing and storage)
The importance of infrastructure affordability (in the context of AI, the hardware cost of computing, transmitting, and storing data) is highlighted in Carlota Perez’s “technology revolution” framework. The framework proposes that each technological breakthrough has two stages:
1) The installation phase is characterized by a large amount of VC investment, infrastructure building, and “go-to-market” (GTM) methods, since customers’ value propositions for new technologies are not yet clear.
2) The deployment phase is characterized by rapid increases in infrastructure supply, which lowers the barriers to entry for new entrants, and is characterized by “pulling” GTM methods, indicating that customers are eager for more products that have not yet been established and that there is strong product-market fit.
Although ChatGPT already has a clear product-market fit and huge customer demand, people may think that AI has entered the deployment phase.
However, one thing is still missing: surplus infrastructure supply that makes it cheap enough for price-sensitive startups to build and experiment.
1. Problem
The problem is that the current market dynamics in the physical infrastructure space are mainly vertical integration oligopoly, where companies such as AWS, GCP, Azure, Nvidia, Cloudflare, and Akamai enjoy high profits. For example, AWS estimates a 61% gross margin on commercialized computing hardware.
-
For AI newcomers, computing costs are too high, especially in LLM.
-
ChatGPT’s training cost is about $4 million, and hardware inference cost is about $700,000 per day.
-
Bloom’s second edition is expected to cost $10 million for training and retraining.
-
If ChatGPT were deployed in Google search, it would result in a revenue loss of $36 billion for Google, a huge transfer of profits from software platforms (Google) to hardware suppliers (Nvidia).
2. Solutions
DePIN networks (such as Filecoin, Bacalhau, Render Network, and ExaBits) can achieve infrastructure cost savings of 75%-90% or more through the following three levers. These networks have been pioneers in accumulating scaled internet hardware for decentralized data storage since 2014, while Bacalhau, Render Network, and ExaBits are coordination layers that match demand with CPU/GPU supply. (Disclaimer: The author was a former employee of Protocol Labs and an advisor to ExaBits.)
1) Push up the supply curve and create a more competitive market
DePIN allows hardware suppliers to become service providers, democratizing the entry of hardware suppliers. It creates a market that allows anyone to join the network as a “miner,” offering their CPU/GPU or storage capacity in exchange for financial rewards, bringing competition to these vested interests.
Although companies like AWS undoubtedly enjoy a 17-year lead in user interface, operational excellence, and vertical integration, DePIN unlocks a new customer base previously priced out by centralized providers. Just as eBay does not compete directly with Bloomingdale’s but rather introduces more affordable alternatives to meet similar needs, the DePIN network does not replace centralized providers but aims to serve a more price-sensitive user base.
2) Balance the economics of these markets through cryptographic economic design
DePIN creates a subsidy mechanism to encourage hardware suppliers to participate in the network, thereby reducing end-user costs. To understand how it works, let’s first compare the costs and revenues of storage suppliers in AWS and Filecoin.
A. DePIN networks can reduce costs for customers: DePIN networks create competitive markets, introducing Bertrand-style competition, and thereby lowering costs for customers. In contrast, AWS EC2 requires a 50% median profit margin and a 31% total profit margin to maintain operations.
B. By issuing token rewards/block rewards as a new source of revenue, DePIN networks can offer even more incentives. In the context of Filecoin, hosting more actual data means storage providers earn more block rewards (tokens). Therefore, storage providers have an incentive to attract more customers and win more transactions to maximize revenue. The token structures for several emerging computational DePIN networks are still confidential but may follow a similar pattern. Examples of these networks include:
Bacalhau: a coordination layer that brings computation to where data is stored, without the need to move large amounts of data
ExaBITS: a distributed computing network specifically designed for AI and compute-intensive applications
3) Lowering cost overhead:
Benefits of DePIN networks, such as Bacalhau and ExaBITS, and IPFS/content-addressable storage include:
A. Creating availability from potential data: there is a large amount of unused data due to the high bandwidth cost of transporting large data sets. For instance, sports venues generate a large amount of event data that is currently unused. DePIN projects unlock the availability of such potential data by processing the data on-site and only transmitting meaningful outputs.
B. Lowering operating costs by locally ingesting data, such as data input, transmission, and import/export.
C. Minimizing manual processes of sharing sensitive data: for example, if Hospital A and B need to merge their respective sensitive patient data for analysis, they can use Bacalhau to coordinate GPU power to process the sensitive data locally, rather than exchanging PII (Personally Identifiable Information) through tedious administrative procedures with each other.
D. Eliminating the need for re-computing base datasets: IPFS/content-addressable storage has built-in properties to deduplicate, track lineage, and verify data. Here is further reading on the features and cost benefits brought by IPFS.
3. Summary
DePIN networks such as Filecoin, Bacalhau, Render Network, and ExaBits can offer 75%-90%+ cost savings by democratizing access to hardware suppliers, introducing competition, balancing market economics through crypto-economic design, and lowering cost overheads.
Second, Creatorship & Humanity Validation
1. Problem
A recent survey showed that 50% of AI scientists believe that the chance of AI leading to human extinction is at least 10%.
This is a sobering thought. AI has already caused societal disruption, and we currently lack regulatory or technical assurance frameworks – the so-called “reverse ramp.”
Unfortunately, the social impact of AI goes far beyond fake podcast debates and images:
1) The 2024 presidential election cycle will be one of them, featuring a political campaign generated by deepfakes of artificial intelligence that is difficult to distinguish from real campaigns.
2) A video of Senator Elizabeth Warren was altered to make it look like Warren was saying Republicans should not be allowed to vote (already debunked).
3) A voice clone imitating Biden criticizing transgender women.
4) A group of artists filed a collective lawsuit against Midjourney and Stability AI, accusing them of unauthorized use of the artists’ works to train AI images that infringe on these artists’ trademarks and threaten their livelihoods.
5) A deepfake AI-generated soundtrack called “Heart on My Sleeve,” performed by The Weeknd and Drake, gained attention before being removed by streaming services. The controversy surrounding copyright infringement is a harbinger of the complex situations that may arise when a new technology enters the mainstream consciousness without necessary rules, in other words, a reverse springboard problem.
What if we could protect AI through cryptographic proof in Web3?
2. Solution
1) Prove the creator’s identity and humanity through encrypted source proof on the blockchain.
This is where we can leverage blockchain technology as a distributed ledger containing immutable records on the blockchain. This makes it possible to verify the authenticity of digital content by checking its encrypted proof.
2) Digital signature proves the identity and humanity of the creator
To prevent deepfakes, a digital signature can be used to generate encrypted proof, which is the signature of the unique original content creator. This signature can be created using a private key that only the creator knows and can be verified using a public key that anyone can use. By attaching this signature to the content, it can be proven that the content was created by the original creator, whether they were human or AI, and that changes to the content were authorized/unauthorized.
3) Use IPFS and Merkle trees to prove authenticity
IPFS is a decentralized protocol that uses content addressing and Merkle tree to reference large datasets. To prove changes to the file content, a Merkle proof is generated, which is a list of hashes showing where a specific data block is in the Merkle tree. Every time a change is made, a new hash is generated and the Merkle tree is updated, providing proof of the file modification.
1) Today
A. For consumers:
B. For developers:
There is little repeatability, because there is no traceability of ETL executed on the data
80% of data scientists’ time is wasted on low-level data cleaning work because they lack the ability to verify and share data output
2) Blockchain will make this possible:
A. For consumers:
Users can provide feedback (such as bias, content review, and fine-grained feedback on output) as input for continuous refinement
B. For developers:
Decentralized data planning layer: Crowdsourcing tedious and time-consuming data preparation processes such as data annotation
Visibility and ability to combine and fine-tune algorithms with verifiable and lineage-based (i.e., they can see the tamper-proof historical record of all past changes)
Data sovereignty (implemented through content addressing/IPFS) and algorithm sovereignty (such as Urbit implementing point-to-point combinability and portability of data and algorithms)
Innovation emerging from basic variants of open source models, LLM, has generated momentum for accelerated innovation
ETL operations and queries of the past can be recorded immutably by blockchain to achieve repeatable training data output (such as Kamu)
Some people may think that the open source platform of Web2 is a compromise, but for the reasons mentioned in this article, it is still far from the optimal state.
3. Summary
The closedness of large technology companies leads to the impossibility of “AI democracy”, i.e., every developer or user should be able to contribute algorithms and data to the LLM model and obtain a portion of the future profits from the model. AI should be accessible, relevant, and owned by everyone. The blockchain network will enable users to provide feedback, contribute data for model monetization, and enable developers to have visibility and the ability to combine and fine-tune algorithms, with verifiable and lineage-based features. Web3 innovations such as content addressing/IPFS and Urbit will achieve data and algorithm sovereignty. Through the blockchain’s immutable record of past ETL operations and queries, repeatable training data output will also become possible.
4. Set up data contribution incentives
1. Problem
Today, the most valuable consumer data is the proprietary business moat of large technology platforms. Tech giants have little incentive to share these data with external parties.
So why not just get the data straight from the source? Why not contribute our data and open source it for talented data scientists to use, making data a public good?
In short, there is no incentive or coordination mechanism to do so. Maintaining data and performing ETL (Extract, Transform, Load) tasks incurs significant overhead. In fact, the data storage industry alone will be a $777 billion industry by 2030, not counting compute costs. Why would anyone take on the work and cost of data pipelines without any return?
For example, OpenAI was originally open source and nonprofit, but struggled to make money. Eventually, in 2019, it had to accept funding from Microsoft and closed off its algorithms to the public. By 2024, OpenAI is expected to generate $1 billion in revenue.
2. Solution
Web3 introduces a new mechanism called dataDAO, which facilitates income redistribution from AI model owners to data contributors, creating an incentive layer for crowdsourced data contributions.
Conclusion
In summary, DePIN is an exciting new category that provides an alternative fuel for driving the revival of Web3 and AI innovation on the hardware side.
While large tech companies dominate the AI industry, emerging players competing with blockchain technology have potential:
The DePIN network lowers the threshold for computing costs; blockchain’s verifiable and decentralized nature makes true open AI possible; innovative mechanisms such as dataDAO incentivize data contributions; blockchain’s immutable and tamper-proof properties provide proof of creator identity to address concerns about negative social impact of AI.