Effortlessly Prepare Datasets with the OGB DatasetSaver Class

Jul 10, 2025

Introduction to OGB DatasetSaver

The OGB DatasetSaver class is a powerful tool designed for developers and researchers working with graph property prediction datasets. It simplifies the process of preparing datasets in a format compatible with the Open Graph Benchmark (OGB) framework. This blog post will guide you through the features, setup, and usage of the DatasetSaver class, enabling you to efficiently manage your datasets.

Main Features of DatasetSaver

  • Compatibility: Ensures datasets follow OGB conventions.
  • Flexibility: Supports both homogeneous and heterogeneous graphs.
  • Ease of Use: Streamlined methods for saving graphs, labels, and splits.
  • Meta Information: Automatically generates metadata for datasets.

Technical Architecture and Implementation

The DatasetSaver class is built on top of Python’s OGB library, leveraging its capabilities to handle graph data efficiently. The class provides a constructor that requires the dataset name, a boolean indicating if the graph is heterogeneous, and the dataset version. This structure allows for easy integration into existing projects.

Constructor Example

from ogb.io import DatasetSaver
import numpy as np

# constructor
dataset_name = 'ogbg-toy'
saver = DatasetSaver(dataset_name = dataset_name, is_hetero = False, version = 1)

Setup and Installation Process

To get started with the OGB DatasetSaver, you need to install the OGB library. You can do this using pip:

pip install ogb

Once installed, you can import the DatasetSaver class and begin preparing your datasets.

Usage Examples and API Overview

After setting up the DatasetSaver, you can start saving your graph data. Below are the steps to follow:

1. Saving Graph List

Create a list of graph objects and save them using the saver.save_graph_list(graph_list) method.

graph_list = []
num_data = 100
for i in range(num_data):
    g = nx.fast_gnp_random_graph(10, 0.5)
    graph = dict()
    graph['edge_index'] = np.array(g.edges).transpose()
    graph['num_nodes'] = len(g.nodes)
    graph['node_feat'] = np.random.randn(graph['num_nodes'], 3)
    graph['edge_feat'] = np.random.randn(graph['edge_index'].shape[1], 3)
    graph_list.append(graph)

saver.save_graph_list(graph_list)

2. Saving Target Labels

Save the target labels for your dataset:

labels = np.random.randint(num_classes, size=(num_data, 1))
saver.save_target_labels(labels)

3. Saving Dataset Split

Prepare and save the dataset split:

split_idx = dict()
perm = np.random.permutation(num_data)
split_idx['train'] = perm[:int(0.8*num_data)]
split_idx['valid'] = perm[int(0.8*num_data): int(0.9*num_data)]
split_idx['test'] = perm[int(0.9*num_data):]
saver.save_split(split_idx, split_name='random')

Community and Contribution Aspects

The OGB community is vibrant and encourages contributions from developers and researchers alike. You can report bugs, suggest features, or contribute code through pull requests.

License and Legal Considerations

The OGB DatasetSaver is licensed under the Apache License 2.0. This allows you to use, modify, and distribute the software, provided you adhere to the terms outlined in the license. For more details, refer to the license file.

Conclusion

The OGB DatasetSaver class is an essential tool for anyone working with graph property prediction datasets. Its ease of use, flexibility, and compatibility with OGB standards make it a valuable asset for researchers and developers alike. Start using DatasetSaver today to streamline your dataset preparation process!

Resources

For more information, check out the official GitHub repository.

FAQ

What is OGB?

The Open Graph Benchmark (OGB) is a collection of benchmark datasets for graph machine learning.

How do I contribute to OGB?

You can contribute by reporting issues, suggesting features, or submitting pull requests following the guidelines in the repository.

What license is OGB under?

OGB is licensed under the Apache License 2.0, allowing for modification and distribution under certain conditions.