ZKML and Distributed Computing: Potential Governance Narratives for AI and Web3

ZKML and Distributed Computing for AI and Web3 Governance

About ZKML: ZKML (Zero Knowledge Machine Learning) is a machine learning technology that combines zero-knowledge proofs and machine learning algorithms to address privacy protection issues in machine learning.

About Distributed Computing: Distributed computing refers to the process of breaking a computing task into multiple smaller tasks and assigning these tasks to multiple computers or processors for efficient processing.

The Current State of AI and Web3: Uncontrolled Swarms and Entropy Increase

In “Out of Control: The New Biology of Machines, Social Systems, and the Economic World”, Kevin Kelly proposed a phenomenon: hives elect their leaders in a dance, and the largest group in this dance becomes the master of the event. This is also known as the “soul of the hive” mentioned by Maurice Maeterlinck—each bee can make its own decisions and guide other bees to confirm, and the decision formed in the end is truly the choice of the group.

The law of entropy increase and disorder itself follows the laws of thermodynamics, and the physical theory embodies putting a certain number of molecules into an empty box and measuring the final distribution. Applied to people, the swarm generated by algorithms can also show group rules despite individual differences in thinking, often limited to a single box due to factors such as the times, and ultimately make consensual decisions.

Of course, group rules are not necessarily correct, but they can represent consensus, and opinion leaders who can pull consensus with their own power are absolute super individuals. However, in most cases, consensus does not necessarily require everyone to completely and unconditionally agree, only the group needs to have universal recognition.

We are not discussing here whether AI will lead humanity astray. In fact, there are already many discussions of this kind. Whether it is the large amount of garbage generated by artificial intelligence applications that has polluted the authenticity of network data, or the mistakes made by group decisions that can lead to some events to a more dangerous situation.

The current situation of AI has a natural monopoly, such as the need for large amounts of computing resources and data for training and deployment of large models, and only a small number of enterprises and institutions have these conditions. These billions of data are treasures to each monopolist, not to mention open source sharing, even mutual access is impossible.

This leads to a huge waste of data, as every large-scale AI project requires the collection of user data, which ultimately benefits the winners – whether through mergers or acquisitions, the growth of individual large projects, or the logic of traditional Internet land grabbing.

Many people say that AI and Web3 are two different things with no connection – the first half is correct, they are two different tracks, but the second half is problematic. It is natural to use distributed technology to limit the monopoly of artificial intelligence and to promote the formation of decentralized consensus mechanisms through artificial intelligence technology.

Deriving from the bottom up: allowing AI to form a truly distributed group consensus mechanism

The core of artificial intelligence lies in humans themselves, and machines and models are just guesses and imitations of human thinking. The so-called group is actually difficult to abstract, because what we see every day are still real individuals. But the model is based on learning and adjustment using massive amounts of data, ultimately simulating the form of the group. We won’t evaluate what kind of results this model will produce, as group misbehavior incidents have occurred more than once or twice. But the model does represent the emergence of this consensus mechanism.

For example, for a specific DAO, governance must be mechanized in order to have an impact on efficiency, because the formation of group consensus is a complicated matter, and even more so when a series of operations such as voting and statistics are required. If the governance of the DAO is expressed in the form of an AI model, all data collection comes from the speech data of all people within the DAO, so the output decision is actually closer to the group consensus.

The group consensus of a single model can be trained according to the above plan, but for these individuals, they are still isolated islands. If there is a collective intelligence system forming a group AI, each AI model in this system will work together to solve complex problems, which actually has a great effect on the consensus level.

For small groups, they can either build their own ecosystems autonomously or cooperate with other groups to meet the needs of super-large computing power or data transactions more efficiently and at lower cost. But then the problem arises: the current situation between various model databases is one of complete mistrust and defense against others – this is where the natural attributes of blockchain come into play: through decentralization, truly distributed AI machine-to-machine security and efficient interaction can be achieved.

A global intelligent brain can enable AI algorithm models that were originally independent and functionally single to cooperate with each other and execute complex intelligent algorithm processes internally, forming a growing distributed consensus network. This is also the greatest significance of AI’s empowerment of Web3.

Privacy and data monopoly? The combination of ZK and machine learning

Whether it is for preventing AI from doing evil or based on the protection of privacy and the fear of data monopoly, humans need to take targeted precautions. The most core problem is that we do not know how the conclusion is reached, and similarly, the operator of the model does not intend to clarify this issue. For the combination of the global intelligent brain mentioned above, it is even more important to solve this problem, otherwise no data holder is willing to share their core with others.

ZKML (Zero Knowledge Machine Learning) is a technology that uses zero-knowledge proofs for machine learning. Zero-knowledge proof (Zero-Knowledge Proofs, ZKP) means that the prover may make the verifier believe the authenticity of the data without revealing specific data.

Taking the theoretical case as an example. There is a standard 9×9 Sudoku, and the completion condition is to fill in the numbers 1 to 9 in nine 3×3 blocks, so that each number can only appear once in each row, column, and 3×3 block. How can the person who created this puzzle prove to the challenger that the Sudoku has a solution without revealing the answer?

Just cover the filling with the answer and randomly let the challenger extract a few rows or columns, shuffle all the numbers and then verify whether they are all from one to nine. This is a simple manifestation of zero-knowledge proof.

The zero-knowledge proof technology has three characteristics of completeness, correctness, and zero knowledge, which means that the conclusion is proved without revealing any details. Its technical source can also reflect simplicity. Under the background of homomorphic encryption, the verification difficulty is far lower than the proof generation difficulty.

Machine learning is the use of algorithms and models to allow computer systems to learn and improve from data. Learning from experience in an automated way can enable the system to automatically perform tasks such as prediction, classification, clustering, and optimization based on data and models.

The core of machine learning is to build models that can learn from data and make predictions and decisions automatically. The construction of these models usually requires three key elements: data sets, algorithms, and model evaluation. The data set is the foundation of machine learning, which includes data samples used for training and testing machine learning models. The algorithm is the core of machine learning models, defining how the model learns and makes predictions from data. Model evaluation is an important part of machine learning, used to assess the performance and accuracy of models, and to determine whether the model needs to be optimized and improved.

In traditional machine learning, data sets usually need to be collected in a centralized location for training, which means that data owners must share data with third parties, which may lead to the risk of data leakage or privacy leakage. With ZKML, data owners can share data sets with others without leaking data, which is achieved by using zero-knowledge proofs.

The application of zero-knowledge proofs to the empowerment of machine learning should have foreseeable effects, which solves the long-standing problems of privacy black boxes and data monopolies: can the project party complete proof and verification without leaking user data input or model-specific details, and can each collection share its data or model for action without leaking privacy data? Of course, the current technology is still in its early stage, and there are bound to be many problems in practice, but this does not prevent us from contemplating, and many teams are already developing it.

Will this situation bring the problem of small databases free-riding on large databases? When you consider governance issues, you return to our Web3 thinking, and the essence of Crypto lies in governance. Whether it is through extensive use or sharing, it should receive the appropriate incentive. Whether it is through the original Pow, PoS mechanism or the latest PoR (reputation proof mechanism), it provides a guarantee for the incentive effect.

Distributed Computing Power: An Innovative Narrative Intertwined with Lies and Reality

Decentralized computing power networks have always been a popular scenario mentioned in the crypto circle, after all, the AI large models require amazing computing power, and centralized computing power networks will not only cause resource waste but also form substantive monopolies – if the final competition is just the number of GPUs, it is too boring.

A decentralized computing power network essentially integrates computing resources dispersed across different locations and devices. The main advantages typically cited are: providing distributed computing capability, solving privacy issues, enhancing the credibility and reliability of artificial intelligence models, supporting fast deployment and operation in various application scenarios, and providing decentralized data storage and management solutions. Yes, through decentralized computing power, anyone can run an AI model and test it on a real chain data set from users around the world, thus enjoying more flexible, efficient, and low-cost computing services.

At the same time, decentralized computing power can solve privacy issues by creating a powerful framework to protect the security and privacy of user data. It also provides a transparent and verifiable computing process, enhances the credibility and reliability of artificial intelligence models, and provides flexible and scalable computing resources for fast deployment and operation in various application scenarios.

If we look at the model training process from a complete centralized computing power perspective, the steps usually include: data preparation, data segmentation, inter-device data transmission, parallel training, gradient aggregation, parameter update, synchronization, and then repeat training. In this process, even if the centralized data center uses a high-performance cluster of computing devices and shares computing tasks through high-speed network connections, the high communication cost becomes one of the biggest limitations of decentralized computing power networks.

Therefore, although decentralized computing power networks have many advantages and potential, the development path still seems arduous according to the current communication costs and actual operational difficulties. In practice, achieving a decentralized computing power network requires overcoming many practical technical problems, whether it is how to ensure the reliability and security of nodes, how to effectively manage and schedule dispersed computing resources, or how to achieve efficient data transmission and communication, etc., all of which may be major problems faced in practice.

Epilogue: Expectations for Idealists

Returning to the commercial reality, the narrative of AI deeply integrated with Web3 looks so beautiful, but capital and users tell us more through actual actions that this is destined to be an extremely difficult innovation journey. Unless project parties can embrace a powerful investor like OpenAI, the bottomless R&D costs and unclear business models will completely crush us.

Whether it’s AI or Web3, both are currently in the very early stages of development, just like the dot-com bubble of the late 20th century, which didn’t really enter its golden age until nearly a decade later. McCarthy once dreamed of designing artificial intelligence with human intelligence in just one holiday, but it wasn’t until nearly 70 years later that we took the critical first step toward artificial intelligence.

Web3+AI is the same, and we have already determined the correctness of the direction forward, leaving the rest to time.

As the tide of time gradually recedes, those who stand tall will be the cornerstone of our journey from science fiction to reality.