How can multidimensional analysis of DePIN help with artificial intelligence?

How can DePIN's multidimensional analysis aid artificial intelligence?

In the past, startups relied on their speed, flexibility, and entrepreneurial culture to break free from organizational inertia and lead technology innovation for a long time. However, all of this has been rewritten by the era of artificial intelligence. So far, the creators of breakthrough AI products have been traditional tech giants such as OpenAI of Microsoft, Nvidia, Google, and even Meta.

What happened? Why did the giants win over the startups this time? Startups can write excellent code, but they face multiple obstacles compared to tech giants:

So, why is blockchain technology needed? Where does it intersect with artificial intelligence? Although it cannot solve all problems at once, the Distributed Physical Infrastructure Network (DePIN) in Web3 creates conditions for solving the above problems. The following will explain how the technology behind DePIN can help artificial intelligence from four dimensions:

  • Reduce infrastructure costs

  • Verify creators and personalities

  • Fill AI democracy and transparency

  • Set up data contribution reward mechanism

In the following text:

  • “Web3” refers to the next generation of the Internet, where blockchain technology and other existing technologies are organic components.

  • “Blockchain” refers to decentralized and distributed ledger technology.

  • “Encrypt” refers to the use of token mechanisms for incentives and decentralization.

1. Reduce Infrastructure Costs (Computing and Storage)

Every wave of technological innovation is ushered in by something expensive becoming cheap enough to waste.

–The Social Technology Debt and the Gutenberg Moment of Software, from SK Ventures

How important is the affordability of infrastructure (the hardware costs for computing, transmission, and data storage) is indicated in Carlota Perez’s theory of technological revolutions, which proposes that technological breakthroughs consist of two phases:

Source: Carlota Perez’s theory of technological revolutions

  • Installation phase characterized by a large amount of risk capital, infrastructure construction, and “push” marketing strategies (GTM), because customers do not understand the value proposition of new technology.

  • Deployment phase characterized by a large increase in infrastructure supply, lowering the threshold for new users and adopting a “pull” marketing strategy (GTM), indicating a high product-market fit, and customers expect more products that have not yet been formed.

Now that attempts such as ChatGPT have proven market fit and customer demand, people may think that AI has entered the deployment phase. However, AI still lacks an important link: surplus infrastructure supply for price-sensitive startups to build and try.

Problem

The current physical infrastructure field is mainly monopolized by vertical integration oligopolies, including AWS, GCP, Azure, Nvidia, Cloudflare, Akamai, etc., with high industry profit margins. AWS is estimated to have a gross profit margin of 61% on commercialized computing hardware. Therefore, new entrants in the AI field, especially in the LLM field, face extremely high computing costs.

  • The cost of one ChatGPT training is estimated to be $4 million, and the hardware inference operation cost is about $700,000 per day.

  • The second edition of Bloom may require $10 million for training and retraining.

  • If ChatGPT enters Google search, Google’s revenue will decrease by $36 billion, and huge profits will shift from the software platform (Google) to the hardware provider (Nvidia).

Source: Layer-by-Layer Analysis – LLM Search Architecture and Cost

Solution

DePIN networks such as Filecoin (originally a DePIN pioneer in 2014, focusing on Internet-level hardware, serving distributed data storage), Bacalhau, Gensyn.ai, Render Network, and ExaBits (a coordinating layer to match CPU/GPU supply and demand) can save 75% to 90%+ of infrastructure costs through the following three aspects:

1. Promoting the supply curve and stimulating market competition

DePIN provides hardware suppliers with equal opportunities to become service providers. It has created a market where anyone can join as a “miner” and exchange CPU/GPU or storage capacity for economic rewards, thereby bringing competition to existing providers.

Although companies like AWS undoubtedly have a 17-year first-mover advantage in user interface, operations, and vertical integration, DePIN attracts new users who cannot accept centralized supplier pricing. Just as eBay does not directly compete with Bloomingdale’s but provides more affordable alternatives to meet similar needs, distributed storage networks do not replace centralized suppliers but aim to serve price-sensitive user groups.

2. Promoting market equilibrium through cryptographic economic design

The subsidy mechanism created by DePIN can encourage hardware suppliers to participate in the network , thereby reducing the cost to end users. To understand the principle, we can look at the cost and revenue of storage providers in AWS and Filecoin in Web2 and Web3.

Customers get discounts: DePIN network creates a competitive market, introducing Bertrand-style competition, thereby reducing the cost customers pay. In contrast, AWS EC2 requires about a 55% profit margin and a 31% overall profit margin to maintain operations. The Token incentives/block rewards provided by DePIN networks are also new sources of revenue . In the context of Filecoin, storage providers can earn block rewards (tokens) by hosting more real data. Therefore, storage providers have the motivation to attract more customers to increase revenue. Several emerging computing DePIN networks’ token structures are still undisclosed, but they are likely to follow similar patterns. Similar networks include:

  • Bacalhau: A coordination layer that introduces computation into data storage locations, avoiding the movement of large amounts of data.

  • exaBITS: A distributed computing network that serves AI and compute-intensive applications.

  • Gensyn.ai: A deep learning model computation protocol.

3. Reduce indirect costs: Advantages of DePIN networks such as Bacalhau, exaBITS, and IPFS/content-addressed storage include:

  • Release the availability of potential data: Due to the high bandwidth cost of transferring large datasets, a large amount of data is currently undeveloped, such as the large amount of event data generated by sports venues. DePIN projects can process data on-site and only transmit meaningful output, unlocking the availability of potential data.

  • Reduce operating costs: Reduce data input, transfer, and import/export costs by acquiring data locally.

  • Minimize manual work in sensitive data sharing: If hospital A and B need to combine the sensitive data of their respective patients for analysis, they can use Bacalhau to coordinate GPU computing power to process sensitive data locally, without the need for cumbersome administrative procedures to exchange personal identity information (PII) with each other.

  • No need to recalculate the basic data set: IPFS/content-addressed storage comes with the ability to deduplicate, trace, and verify data. For the features and cost-effectiveness of IPFS, refer to this article.

AI-generated summary: AI needs the affordable infrastructure provided by DePIN, and the infrastructure market is currently monopolized by vertical integration oligarchs. DePIN networks such as Filecoin, Bacalhau, Render Network, and ExaBits democratize the opportunity to become hardware suppliers, introduce competition, maintain market economic balance through encrypted economic design, lower costs by 75%-90% or more, and reduce indirect costs.

2. Verify creators and personality

Problem

A recent survey shows that 50% of AI scholars believe that the possibility of AI causing catastrophic harm to humans exceeds 10%.

People need to be wary of the social disruption caused by A.I., which still lacks regulation or technical standards, a situation known as the “reverse convex corner.” For example, in this Twitter video, podcast host Joe Rogan and conservative commentator Ben Shapiro are debating the movie “Ratatouille,” but the video is AI-generated. Source: Bloomberg

It is worth noting that the social impact of A.I. goes far beyond the problems of fake blogs, conversations, and images:

  • During the 2024 U.S. presidential election, AI-generated deepfake campaign content first achieved a lifelike effect.

  • A video of Senator Elizabeth Warren was edited to make her “say” that “Republicans should not be allowed to vote” (already debunked).

  • Voice synthesis of Biden’s voice criticized transgender women.

  • A group of artists filed a class-action lawsuit against Midjourney and Stability AI, accusing them of using artists’ works without authorization to train AI, infringing on copyrights, and threatening artists’ livelihoods.

  • The AI-generated song “Heart on My Sleeve” by The Weeknd and Drake became popular on streaming platforms, but was later removed. When new technologies enter the mainstream without regulation, many problems arise, including copyright infringement, which is a “reverse convex corner” problem.

So can we add AI-related specifications to Web3?

Solution

Use encrypted on-chain source proof for personality and creator proof

Let blockchain technology truly play a role-as a distributed ledger containing immutable on-chain historical records, the authenticity of digital content can be verified through content encryption proof.

Digital signatures as creator and personality proof

To identify deepfakes, encrypted proofs can be generated using unique digital signatures of the original content creator. The signature can be created using a private key known only to the creator and verified by a public key that is publicly available to everyone. With a signature, it can be proven that the content was created by the original creator, whether the creator is human or AI, and authorized or unauthorized changes to the content can be verified.

Proof of Authenticity Using IPFS and Merkle Trees

IPFS is a distributed protocol that uses content addressing and Merkle trees to reference large datasets. To prove that a file’s content has been received and changed, a Merkle proof is generated, which is a string of hashes that shows the specific data block’s position in the Merkle tree. Each change adds a hash to the Merkle tree, providing proof of file modification.

The pain point of encryption schemes is the incentive mechanism. After all, identifying deepfake creators can reduce negative social impact, but it doesn’t bring the same economic benefits. This responsibility is likely to fall on mainstream media distribution platforms like Twitter, Meta, and Google, and it does. So why do we need blockchain?

The answer is that blockchain’s encrypted signatures and proof of authenticity are more effective, verifiable, and deterministic. Currently, the process of detecting deepfakes is mainly done through machine learning algorithms (such as Meta’s “Deepfake Detection Challenge,” Google’s “Asymmetric Numeral Systems” (ANS), and c2 Blocking: https://c2Blocking.org/) to identify patterns and anomalies in visual content, but it is often inaccurate and lags behind the development speed of deepfakes. Generally, manual review is needed to determine authenticity, which is inefficient and expensive.

If one day every piece of content has an encrypted signature, everyone can verify the source of creation , tag tampering or forgery, and we will usher in a beautiful world.

AI-generated summary: AI may pose a significant threat to society, especially deepfakes and unauthorized content use, while Web3 technologies, such as creator proofs using digital signatures and proof of authenticity using IPFS and Merkle trees, can verify the authenticity of digital content, prevent unauthorized changes, and provide standards for AI.

Three, AI Democratization

Problem

Today’s AI is a black box consisting of proprietary data and algorithms. The closed nature of large tech companies like LLM stifles my vision of “AI democracy,” where each developer, even users, could contribute algorithms and data to LLM models and receive a portion of the profit when the model succeeds (related article).

AI Democracy = Visibility (being able to see the input data and algorithms of the model) + Contribution (being able to contribute data or algorithms to the model).

Solution

The goal of AI democracy is to make generative AI models open to the public, relevant to the public, and owned by the public. The following table compares the current state of AI with the future that can be achieved through Web3 blockchain technology.

Current State:

For customers:

  • Receive one-way LLM output

  • Cannot control how personal data is used

For developers:

  • Low composability

  • ETL data processing is not traceable and difficult to reproduce

  • Data contribution sources are limited to data owning organizations only

  • Closed-source models can only be accessed through API payments

  • Sharing data output lacks verifiability. 80% of data scientists’ time is spent on low-end data cleaning

After Combining with Blockchain:

For customers:

Users can provide feedback (such as bias, content review, and feedback on granularity of output) as a basis for fine-tuning

Users can choose to contribute data in exchange for a share of the model’s profits

For developers:

  • Distributed Data Management Layer: Crowdsource time-consuming data labeling and other data preparation work

  • Visibility: Ability to combine and fine-tune algorithms, leveraging verifiable sources (able to see all tamper-proof change history)

  • Data sovereignty: (Implemented through content addressing/IPFS) and algorithm sovereignty (e.g. Urbit achieves point-to-point combination and portability of data and algorithms)

  • Accelerate LLM innovation: Accelerate LLM innovation from various variations of basic open source models.

  • Reproducible training data output: Achieved through blockchain’s immutable records of past ETL operations and queries (such as Kamu).

Some argue that open-source platforms in Web 2.0 provide a compromise solution, but the effect is not ideal, as discussed in exaBITS’ blog post.

AI-generated summary: Closed LLMs by large tech companies strangle “AI democracy,” in which every developer or user can contribute algorithms and data to an LLM model and receive a portion of profits when the model generates revenue. AI should be open and relevant to the public and owned by the public. Through a blockchain network, users can provide feedback, contribute data to models in exchange for a share of profits, and developers can gain visibility and verifiable sources to combine and fine-tune algorithms. Innovations in Web3, such as content addressing/IPFS and Urbit, will realize data and algorithm sovereignty. Through blockchain’s immutable records of past ETL operations and queries, the reproducibility of training data outputs will also become possible.

IV. Establish a Data Contribution Reward Mechanism

Problem

Today, the most valuable consumer data belongs to large tech companies as proprietary assets and constitutes their core business barriers. Tech giants have no incentive to share this data with external parties.

So, why can’t we get data directly from data creators or users? Why can’t we turn data into a public resource, contribute data, and make the data open for use by data scientists?

Simply put, because of a lack of incentive and coordination mechanisms. Maintaining data and performing ETL (extraction, transformation, and loading) is a significant indirect cost. In fact, data storage alone will be a $777 billion industry by 2030, not including computing costs. No one will bear the cost of data processing for free.

Take OpenAI, for example, initially set up to be non-profit and open-source, but the difficulty of monetization made it impossible to cover costs. In 2019, OpenAI had to accept funding from Microsoft, and algorithms are no longer open to the public. It is expected that OpenAI will generate $1 billion in revenue by 2024.

Solution

Web3 introduces a new mechanism called “dataDAO,” which promotes income redistribution between AI model owners and data contributors and creates an incentive layer for crowdsourced data contributions. Due to space limitations, it will not be explained here, but two articles below can be read for more information:

  • How DataDAO works/DataDAO principle, by HQ Han of Protocol Labs

  • How data contribution and monetization works in web3/web3 data contribution and monetization, in which I discuss the mechanisms, shortcomings, and opportunities of dataDAO in depth

Overall, DePIN provides a new hardware energy source to promote Web3 and AI innovation. Although technology giants dominate the AI industry, emerging participants can join the competition by leveraging blockchain technology: DePIN’s approach to lowering entry barriers includes reducing computing costs; the verifiable and distributed nature of blockchain enables true open AI; innovative mechanisms such as dataDAO incentivize data contributions; and the immutability and anti-tampering properties of blockchain provide creator identity verification, dispelling concerns about negative social impact of AI.