Cookbook ######## .. contents:: Contents :depth: 1 :backlinks: top :local: .. _cookbook-iterate-datasets: Iterate and reframe datasets **************************** Let's first load the bencoded data from the compressed file: .. code-block:: python from allisbns.dataset import load_bencoded input_path = "aa_isbn13_codes_20251118T170842Z.benc.zst" with open(input_path, "rb") as f: input_data = load_bencoded(f) Then create an iterator over all datasets and iterate: .. code-block:: python from allisbns.dataset import iterate_datasets, CodeDataset from allisbns.isbn import LAST_ISBN for dataset in iterate_datasets(input_data): ... The iterable datasets can be narrowed only to the selected ones: .. code-block:: python for dataset in iterate_datasets( input_data, collections=["md5", "rgb"] ): ... Also, the iterable datasets can be lazy reframed to some new bounds. For example, let's iterate over the '978' region of all datasets: .. code-block:: python from allisbns.isbn import get_prefix_bounds # Get the corresponding bounds start_isbn, end_isbn = *get_prefix_bounds("978") # Create the iterator, fill all datasets to the end ISBN iterator = iterate_datasets(input_data, fill_to_isbn=end_isbn) # Use the generator expression to lazy reframe datasets reframing = (x.reframe(start_isbn, end_isbn) for x in iterator) for reframed_dataset in reframing: ... Merge and save datasets *********************** Create the iterator as above and union all datasets together: .. code-block:: python from allisbns.isbn import LAST_ISBN from allisbns.merge import union # The bounds must be the same iterator = iterate_datasets(input_data, fill_to_isbn=LAST_ISBN) all_merged = merge.union(iterator) After merging, we can save the result codes to a file for later use. For example, let's temporarily save it to a binary file in :mod:`NumPy format `: .. code-block:: python timestamp = str(input_path).split(".")[0].split("_")[-1] output_path = f"ms_isbn13_codes_{timestamp}_all.npy" with open(output_path, "wb") as f: np.save(f, all_merged.codes, allow_pickle=False) To write it down in the original format with compression, we can use :meth:`~allisbns.dataset.CodeDataset.write_bencoded`: .. code-block:: python with open(output_path.with_suffix(".benc.zst"), "wb") as f: all_merged.write_bencoded(f, prefix="all")