15min
Read
Nuklai - Network Economics
by
Nuklai

Introduction

Imagine a data economy where data is not locked idle in silos, but instead is actively in motion between its participants for various use cases: every byte can be put to good use and every data point carries the potential to bring new innovations. Nuklai's mission is to establish a unified data landscape that is inclusive and interoperable.

Nuklai introduces a novel modular decentralized network that powers a public data sharing ecosystem and allows the deployment of custom private or semi-private data-sharing networks (data consortiums) . This network economics paper presents the Nuklai network and outlines the role "NAI" plays in it as a native network token.

Challenges and opportunities

Competitive advantage through data

Many industries have faced disruption of traditional business models in recent years, and in the coming years many more industries will face disruptions too while startups innovate more rapidly than ever. After OpenAI took the world by storm with ChatGPT, artificial intelligence will play its own disruptive role in virtually every industry. Traditional businesses are forced to protect existing business models, and explore new ones in order to stay ahead of the competition. However, there's something that these traditional enterprises all have in common: they have gathered very large amounts of data.

All companies and individuals generate data, day in and day out. We generate data by using our phones, taking public transport, driving our cars or while shopping for groceries. This generated data usually serves a clear purpose, like targeted advertising, optimization of availability of buses and trains, reporting of congestions or to build a purchasing strategy.

Beyond its original scope, this data is largely ignored and remains locked away on private servers. Currently, fragmentation of the data landscape leads to a high barrier to monetize data: you'll need to use different tools to access data from different sources, build custom connectors or find the right ingestion tool to combine the data from these different sources. Moreover, you'll face expensive business intelligence platforms that have more features than your organization will ever need in order to leverage the insights that come from this data. Even just building out a pilot project to test the waters can become a lengthy and costly process quickly, defeating its purpose entirely. While data should be an organization's asset, it's becoming a liability.

Enterprises that look for new business models in order to stay ahead, need to leverage their vast amounts of data, but are facing these persistent challenges to be able to quickly experiment and adapt.

The rise of Large Language Models (LLMs)

LLMs are trained on massive amounts of unstructured data. This generally means they are a pleasant conversational partner that can assist you in many useful ways, they can even code for you! The downside of current LLMs is that when it comes to fact-based information, details are largely hallucinated.

Incorporating structured data into LLM training can significantly enhance their capabilities in fact-based conversations and reasoning. This adaptation could lead to the emergence of new use cases for LLMs, such as more sophisticated analytical tasks and specialized professional consultations, catering to industries like legal, healthcare, and finance, where accuracy and up-to-date information are crucial.

The challenge lies in the fragmented nature of data landscapes. Accessing structured data feeds is difficult due to their varied formats and the necessity for numerous custom connectors. This fragmentation hampers the integration process and consequently complicates the task of rewarding data feed owners fairly and transparently.

Creating data-sharing ecosystems that integrate with LLMs can address these issues and unlock great potential for niche use cases. Such ecosystems would allow for efficient and equitable data exchange, leveraging an LLMs' ability to provide accurate, context-aware insights. For instance, in healthcare, real-time patient data can enable LLMs to offer better diagnostic support, while in finance, up-to-the-minute market data can lead to more accurate financial forecasting.

To fully exploit these integrations, third-party developers would require easy-to-use APIs. These APIs would enable seamless integration of AI capabilities into existing systems, allowing businesses to leverage advanced LLM functionalities without the need for extensive technical expertise.

Lastly, the cost of training custom models or running inference on large datasets can be prohibitive, particularly for small and medium-sized businesses (SMBs). This necessitates a shift towards distributed computing power, which would democratize access to advanced AI capabilities, allowing SMBs to compete on a level playing field with larger corporations.

How Nuklai addresses these challenges and opportunities

Nuklai revolutionizes data management and utilization in a way that seamlessly blends with the needs of modern businesses. One of the platform’s core strengths is its ability to effortlessly upload and store datasets of different formats, automatically structuring them into an efficient, generalized format. This uniformity ensures that when users access multiple datasets, they encounter a consistent interface, significantly simplifying data manipulation and analysis.

The platform's capability to combine multiple datasets opens possibilities for generating new insights. Such combinations allow for the exploration of connections and trends that were previously undiscoverable due to the isolation of these datasets. This feature is particularly revolutionary, as it enables the synthesis of knowledge from diverse domains in a single query, unlocking entirely new possibilities and insights.

Nuklai also offers opportunities for external contributors to monetize their skills. These contributors can enhance the platform by enhancing the metadata of the dataset. This richly described metadata is then more effectively utilized, for example in LLM integrations or in AI-driven analyses to draw connections between seemingly unrelated datasets.

The platform further simplifies the data analysis process with its visual data pipeline editor. This tool allows users to create data pipelines for deriving insights from combined datasets without the need for expertise in SQL, Python, or similar languages, making advanced data analysis attainable for a broader range of users.

Nuklai is built on a foundation of fairness and inclusivity. Contributors of data and metadata are rewarded appropriately and transparently, ensuring a sustainable and thriving community-driven collaborative ecosystem. Additionally, the platform's LLM integrations, which can be trained or run inference using distributed computing power, enable users to interact with their own or others’ data, bringing a more intuitive and human dimension to data analysis.

Nuklai addresses the contemporary challenges of fragmentation of the data landscape and accessibility by offering a collaborative, community-driven, unified, efficient, and user-friendly platform that will power the next generation of LLMs. It empowers businesses of all sizes to leverage the full potential of their data, enabling them to innovate and compete more effectively.

Why decentralization matters

Existing data sharing networks and intelligence platforms heavily rely on centralized services. When businesses share data, especially with potential competitors, they expose themselves to risk through their valuable assets. Data sharing consortiums need a high level of trust among all network participants, which is challenging to establish and maintain. Who will own and maintain the infrastructure needed? Who is appointed to control accounting and how will you detect fraudulent activity? Decentralization offers a solution by removing the need for trust. In a decentralized network, participants can engage confidently, knowing they remain in control over their data and their interests are protected.

Nuklai is an ecosystem that, unlike traditional platforms that rely on central entities for operation and maintenance, is designed to be self-sustaining, ensuring its longevity and resilience. In a decentralized ecosystem like Nuklai, the absence of reliance on central parties for backend services is a significant advantage. This structure guarantees that the ecosystem remains operational and efficient, even if individual companies within it face challenges or cease operations.

Contributors outside of the network contribute to datasets and enhance metadata. It will be too much hassle in order to set up an agreement between each contributor and data provider to ensure each contributor is fairly rewarded for their contributions. Decentralizing these agreements through code and tracking the contributions onchain, ensure that rewards are transparently distributed as agreed without having to trust an intermediary.

Nodes

The network of nodes and subnets serves as a way to secure the network and also as a way of distributing compute power to network participants that need to run big data pipelines or train custom large language models using the data in Nuklai's ecosystem.

Compute Nodes: Serve as the distributed computational power of the Nuklai network. They provide CPU and GPU power that can perform a variety of complex tasks such as training custom AI (like LLMs). Those that run compute nodes get compensated in NAI for sharing their idle resources.

Validator Nodes: Function as the auditors, looking for errors or attempts to compromise the network with false information. They ensure the integrity of computations and transactions that take place within the network. Validator nodes also ensure that the network's specific tasks are executed and they are responsible for the emission balancer.

Compute Nodes

The nodes within Nuklai's ecosystem provide distributed computing power, capable of managing extensive data pipelines or training custom large language models. For instance, in the field of meteorology, this computational capacity is instrumental in enhancing weather forecasting and climate modeling. Such advancements are vital for sectors like agriculture and disaster management. Similarly, in healthcare, the distributed network supports federated learning, facilitating the development of medical AI models while prioritizing patient privacy.

For small and medium-sized businesses, which typically lack the resources to operate and maintain advanced technological infrastructures, this distributed compute network opens the door to a wide array of applications that were previously inaccessible.

Compensation for compute node operators is issued in NAI tokens. Upon the initiation of a compute power request, the requisite amount of NAI is determined and reserved until the task is completed and verified. A portion of each transaction, specifically 25%, is allocated to the emission balancer protocol.

Participation as a compute node requires operators to stake NAI tokens (500,000 NAI). This stake is at risk of being reduced, or 'slashed', should the node exhibit unreliable outputs or suffer from recurring or significant downtime.

Validator Nodes

The validators of the network ensure the decentralization and thus the security of the network. Their primary role is to validate the transactions and that each action on the network adheres to the network's protocol and rules. Additionally, they are also tasked with confirming the compute resources that have been spent by the compute nodes, run the emission balancer and execute the network specific tasks, like:

  • Distribute dataset revenue to all stakeholders, ensure that contributors are fairly rewarded
  • Record actions, like querying datasets and data pipeline executions, to ensure that results can be traced back to their origins
  • Enforce access control

The minimum stake for node operators is set at 1.5 million NAI, which must be maintained for a minimum duration of six months to avoid slashing penalties. For the initial 100 nodes, the annual percentage rate (APR) on their stake is set at 25%. Beyond this threshold, the APR is proportionately distributed among all nodes. Consequently, with 200 validator nodes, the APR would be 12.5%. In the network's initial phase, validator nodes require authorization before joining the network.

Transaction fees

With each transaction on the network that mutates the state of the blockchain, a transaction fee needs to be paid to reward the nodes for validating and executing these transactions. These fees are paid in NAI. Transaction fees bare a minimum, determined by the units of work that will take place, but can be increased to convince a node to pick up the transaction before others when a quick time of execution is important. Of each transaction fee paid, 50% is automatically submitted to the emission balancer. The remainder is used to reward the nodes for validating the network.

Network Curators

The Nuklai network requires active participation by its curators. These individuals or companies play a significant role in adding value to the network by introducing new data, curating existing data, transactions that take place, contributions that have been made and by ensuring that ethical standards are upheld. Newly sourced data acts as a diverse source of information for data analysis and applications. Since these individuals or companies spent significant resources introducing unique, relevant and often hard-to-source datasets they bring tremendous value to the network. This data can then be used within advanced analytics, machine learning or the training of other artificial intelligence models such as large language models.

By introducing new unique data, curators help maintain the network's value. In addition to those that contribute data, curators in the Nuklai network as 'contributors' increase the utility of existing datasets through deeply enriching the metadata of the datasets. This process contextualises the data by annotating it and tagging it in various ways. This increases the accessibility, interpretability and applicability of the data. By doing this work, curators increase the value of the raw data and transform it into a more valuable asset.

Curators also combine different datasets with each other, thereby unveiling new insights and correlations. The synthesis of these diverse datasets results in new datasets of their own. Such datasets may reveal patterns and trends that were hidden before, opening the door for potential innovation and problem solving. Curators are key players that are driving the network's value.

Network Utility

const ApiUrl = "https://api.nukl.ai/api/public/v1/datasets/:datasetId/queries";
const ApiKey = "[API_KEY]";
const DatasetId = "[DATASET_ID]";

const headers = {
  "Content-Type": "application/json",
  'authentication': ApiKey
}
ApiUrl = "https://api.nukl.ai/api/public/v1/datasets/:datasetId/queries"
ApiKey = "[API_KEY]"
DatasetId = "[DATASET_ID]"

headers = {
  "Content-Type": "application/json",
  "authentication": ApiKey
}
$ApiUrl = "https://api.nukl.ai/api/public/v1/datasets/:datasetId/queries";
$ApiKey = "[API_KEY]";
$DatasetId = "[DATASET_ID]";

$headers = [
  "Content-Type: application/json",
  "authentication: $ApiKey"
];

As a decentralized network that powers the new data economy, the Nuklai network requires a network token to function. This network token has several utilities within the ecosystem:

Means of access and toll: each transaction that is done on the network is registered and validated. Participants of the network pay a fee for the usage of the network. This fee is paid with the NAI token.

Means of reward: contributors to the network are incentivized through NAI tokens in order to bootstrap the network and increase the level of decentralization of the network.

Means of data-control: when data is shared within (semi) private consortium networks, new Nuklai subnets are deployed and are required to be connected and secured by the main Nuklai network. In order to secure and validate these subnets, NAI is required as well.

Means of compute power: computational power that is added to the network and is used by participants to execute large data pipelines, train their AI or custom large language models on, are rewarded in NAI for providing the computational power to the network. Depending on how long and how complex these computations are, the rewards in NAI will vary.

Means of governance: participants in the network determine the future of the network and the token serves as a democratic and decentralized way of decision making that are in the best interest of the participants of the network.

Network Token Distribution

The Nuklai network follows a system in which inflation is reduced over time by means of reducing the block rewards gradually. Each month the inflation rate gradually drops according to the activity on the network, until the maximum supply of 10 billion NAI tokens is reached.

On top of that, the Nuklai network incentivizes sufficient decentralization by rewarding nodes that secure the network to provide the decentralized security that is needed. The token distribution of the network is optimized for both node decentralization and the formation of the DAO treasury.

Token distribution: NAI will be launched with an initial supply at genesis of 853 million NAI. An Emission Balancer is implemented as a mechanism to avoid unlimited growth, ensuring stablization of the maximum supply of 10 billion NAI, assuming sufficient utilization of the main network and the computational nodes.

Nuklai DAO

The DAO will be installed to give each stakeholder in the ecosystem a voice and a vote in decision making for the future of the Nuklai network. Anyone that is a holder of NAI is able to make a proposal. Where validators provide technical decentralization, the DAO makes sure that the governance of the network is decentralized. It is foreseen that there are a few important topics that the DAO can decide upon:

  • Spending & Budget limits of the DAO can be increased or decreased depending on the DAO budget.
  • The maximum APY validator nodes receive at a given time.
  • DAO token allocations to community incentives, marketing and business development and further technical development of the ecosystem.

The DAO will receive its first allocation of NAI at the inception of mainnet. It will continuously receive NAI until a total allocation of 1.3B is emitted and can only be utilized by the DAO itself.

Emission Balancer

In order to balance the token emissions of the network and achieve a theoretical maximum supply, mechanisms to reduce the emission of tokens will be put in place:

  • When computational nodes are used within the network, 25% of their income will be used within the emission balancer.
  • 50% of all transaction fees will be subject to the emission balancer at all times.

Example:

  • 50% of all transaction fees and 25% of the computational income will be accrued in the Emission Balancer, totaling to 3M tokens over the coming month.
  • 2M tokens are minted over the coming month to reward the validator nodes.
  • Instead of minting 2M new tokens, the Emission Balancer's treasury is used to reward the validator nodes.
  • 1M tokens are left in the Emission Balancer, which will reset to 0 in the next Emission Balancer epoch (approximately one month).

Note, that even though in the example a period of a month is used, new token emissions and emission balancer transactions take place per block.

Introducing a new data economy

The following scenarios illustrate how the Nuklai network will create a new data economy and will showcase the utility of the network token.

Data Consortium

Data Consortiums are established by multiple organizations that benefit from sharing data with other consortium members. Within these consortiums there is a need to control the data that is shared and who has access to which datasets, which can be limited to a set of approved partners. Some of the industries where data consortiums are likely to be formed are within the web3 ecosystem, the agricultural technology sector, the automotive industry, and many others.

An example of a data consortium would be a car manufacturer ecosystem, where various companies in the supply and value chain work together with each other. The car manufacturer can share data (either for free or paid) with dealerships and vice versa. Other data might be shared with start-ups that are working to improve some technological innovation they are working on or to train an AI. Data usage can be tracked and micro-payments can take place where relevant.

Data collaborations & curation

Companies and individuals alike can contribute to datasets, as it's tracked who provided what data or metadata. Data collaborators can work together to enhance the metadata of datasets, even optimized for different specific purposes (human-readability, AI integration, etc.). The incremental benefits this brings, is that smaller companies or even groups of individuals could pool together their data and monetize it. This creates a leveled playing field against larger competitors within the data space.

Access to data

Capgemini research showed that the majority of global enterprises have large amounts of datasets that remain underutilized. At the same time, companies are looking outside of their organization for valuable data sources for various reasons. Such as improving their machine learning or artificial intelligence models or providing their large language models with structured data. All of this to get better business insights and do better forecasting. With the introduction of Nuklai we intend to break down the barriers that currently exist when companies want to monetize their underutilized data or purchase data from third parties.

Access to computing power (computational nodes)

Compute power is increasingly becoming a critical resource for businesses and researchers, especially in the fields that require intensive computational tasks such as federated learning, machine learning and the training of artificial intelligence models. A robust and scalable computational resource is required to process large amounts of data, redefining algorithms and performing complex statistical simulations. This is especially pronounced in the field of artificial intelligence and machine learning, where the training and fine tuning of models require substantial computational capacity. 

Federated learning, an approach to machine learning further propels this demand by enabling the training of algorithms across multiple decentralized devices or servers, thereby necessitating significant distributed computer power.

// @dataset represents your dataset rows as a table
const body = {
  sqlQuery: "select * from @dataset limit 5",
}
@dataset represents your dataset rows as a table
body = {
  "sqlQuery": "select * from @dataset limit 5"
}
// @dataset represents your dataset rows as a table
$body = [
  "sqlQuery" => "select * from @dataset limit 5"
];
const ApiUrl = "https://api.nukl.ai/api/public/v1/datasets/:datasetId/queries";
const ApiKey = "[API_KEY]";
const DatasetId = "[DATASET_ID]";

const headers = {
  "Content-Type": "application/json",
  'authentication': ApiKey
}

// @dataset represents your dataset rows as a table
const body = {
  sqlQuery: "select * from @dataset limit 5",
}

// make request
fetch(ApiUrl.replace(':datasetId', DatasetId), {
  method: "POST",
  headers: headers,
  body: JSON.stringify(body), // convert to json object
})
  .then((response) => response.json())
  .then((data) => {
    console.log(data);
  })
  .catch((error) => {
    console.error(error);
  });
import requests
import json

ApiUrl = "https://api.nukl.ai/api/public/v1/datasets/:datasetId/queries"
ApiKey = "[API_KEY]"
DatasetId = "[DATASET_ID]"

headers = {
  "Content-Type": "application/json",
  "authentication": ApiKey
}

# @dataset represents your dataset rows as a table
body = {
  "sqlQuery": "select * from @dataset limit 5"
}

# make request
url = ApiUrl.replace(':datasetId', DatasetId)
try:
  response = requests.post(url, headers=headers, data=json.dumps(body))
  data = response.json()
  print(data)
except requests.RequestException as error:
  print(f"Error: {error}")
$ApiUrl = "https://api.nukl.ai/api/public/v1/datasets/:datasetId/queries";
$ApiKey = "[API_KEY]";
$DatasetId = "[DATASET_ID]";

$headers = [
  "Content-Type: application/json",
  "authentication: $ApiKey"
];

// @dataset represents your dataset rows as a table
$body = [
  "sqlQuery" => "select * from @dataset limit 5"
];

// make request
$ch = curl_init(str_replace(':datasetId', $DatasetId, $ApiUrl));

curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($body)); // convert to json object
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

$result = curl_exec($ch);
curl_close($ch);

echo $result;
curl -X POST 'https://api.nukl.ai/api/public/v1/datasets/[DATASET_ID]/queries' \
  -H 'Content-Type: application/json' \
  -H 'authentication: [API_KEY]' \
  -d '{"sqlQuery":"select * from @dataset limit 5"}'
const ApiUrl = "https://api.nukl.ai/api/public/v1/datasets/:datasetId/queries/:jobId";
const ApiKey = "[API_KEY]";
const DatasetId = "[DATASET_ID]";
const JobId = "[JOB_ID]"; // retrieved from /queries request

const headers = {
  "Content-Type": "application/json",
  'authentication': ApiKey
}

// make request
fetch(ApiUrl.replace(':datasetId', DatasetId).replace(':jobId', JobId), {
  method: "GET",
  headers: headers
})
  .then((response) => response.json())
  .then((data) => {
    console.log(data);
  })
  .catch((error) => {
    console.error(error);
  });
import requests

ApiUrl = "https://api.nukl.ai/api/public/v1/datasets/:datasetId/queries/:jobId"
ApiKey = "[API_KEY]"
DatasetId = "[DATASET_ID]"
JobId = "[JOB_ID]"  # retrieved from /queries request

headers = {
  "Content-Type": "application/json",
  "authentication": ApiKey
}

# make request
url = ApiUrl.replace(':datasetId', DatasetId).replace(':jobId', JobId)
try:
  response = requests.get(url, headers=headers)
  data = response.json()
  print(data)
except requests.RequestException as error:
  print(f"Error: {error}")
$ApiUrl = "https://api.nukl.ai/api/public/v1/datasets/:datasetId/queries/:jobId";
$ApiKey = "[API_KEY]";
$DatasetId = "[DATASET_ID]";
$JobId = "[JOB_ID]"; // retrieved from /queries request

$headers = [
  "Content-Type: application/json",
  "authentication: $ApiKey"
];

// @dataset represents your dataset rows as a table
$body = [
  "sqlQuery" => "select * from @dataset limit 5"
];

// make request
$ch = curl_init(str_replace(array(':datasetId', ':jobId'), array($DatasetId, $JobId), $ApiUrl));

curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

$result = curl_exec($ch);
curl_close($ch);

echo $result;
curl 'https://api.nukl.ai/api/public/v1/datasets/[DATASET_ID]/queries/[JOB_ID]' \
  -H 'Content-Type: application/json' \
  -H 'authentication: [API_KEY]'