ouroboros-consensus-0.20.1.0: Consensus layer for the Ouroboros blockchain protocol
Safe HaskellSafe-Inferred
LanguageHaskell2010

Ouroboros.Consensus.Storage.ChainDB.Impl.Iterator

Description

Iterators

Synopsis

Documentation

closeAllIteratorsIOLike m ⇒ ChainDbEnv m blk → m () Source #

Close all open Iterators.

This can be called when the ChainDB is already closed.

stream ∷ ∀ m blk b. (IOLike m, HasHeader blk, HasCallStack) ⇒ ChainDbHandle m blk → ResourceRegistry m → BlockComponent blk b → StreamFrom blk → StreamTo blk → m (Either (UnknownRange blk) (Iterator m blk b)) Source #

Stream blocks

Start & end point

The start point can either be in the ImmutableDB (on our chain) or in the VolatileDB (on our chain or on a recent fork). We first check whether it is in the VolatileDB, if not, we check if it is in the ImmutableDB (see "Garbage collection" for why this order is important). Similarly for the end point.

If a bound can't be found in the ChainDB, an UnknownRange error is returned.

When the bounds are nonsensical, e.g., > StreamFromExclusive (Point (SlotNo 3) _) > StreamToInclusive (RealPoint (SlotNo 3) _) An InvalidIteratorRange exception is thrown.

Paths of blocks

To stream blocks from the ImmutableDB we can simply use the iterators offered by the ImmutableDB.

To stream blocks from the VolatileDB we have to construct a path of points backwards through the VolatileDB, starting from the end point using getPredecessor until we get to the start point, genesis, or we get to a block that is not in the VolatileDB. Then, for each point in the path, we can ask the VolatileDB for the corresponding block.

If the path through the VolatileDB is incomplete, we will first have to stream blocks from the ImmutableDB and then switch to the path through the VolatileDB. We only allow the tip of the ImmutableDB to be the switchover point between the two DBs. In other words, the incomplete path through the VolatileDB must fit onto the tip of the ImmutableDB. This must be true at the time of initialising the iterator, but does not have to hold during the whole lifetime of the iterator. If it doesn't fit on it, it means the path forked off more than k blocks in the past and blocks belonging to it are more likely to go missing because of garbage-collection (see the next paragraph). In that case, we return ForkTooOld.

Garbage collection

We have to be careful about the following: as our chain grows, blocks from our chain will be copied to the ImmutableDB in the background. After a while, old blocks will be garbage-collected from the VolatileDB. Blocks that were part of the current chain will be in the ImmutableDB, but blocks that only lived on forks will be gone forever.

This means that blocks that were part of the VolatileDB when the iterator was initialised might no longer be part of the VolatileDB when we come to the point that the iterator will try to read them. When this is noticed, we will try to open an iterator from the ImmutableDB to obtain the blocks that have moved over. However, this will only work if they were and are part of the current chain, otherwise they will have been deleted from the VolatileDB without being copied to the ImmutableDB.

This iterator is opened with an open upper bound and will be used to stream blocks until the path has been fully streamed, the iterator is exhausted, or a block doesn't match the expected point. In the latter two cases, we switch back to the VolatileDB. If the block is missing from the VolatileDB, we will switch back to streaming from the ImmutableDB. If that fails, we switch back to the VolatileDB. To avoid eternally switching between the two DBs, we only switch back to the VolatileDB if the stream from the ImmutableDB has made progress, i.e. streamed at least one block with the expected point. If no block was streamed from the ImmutableDB, not even the first one, we know for sure that that block isn't part of the VolatileDB (the reason we switch to the ImmutableDB) and isn't part of the ImmutableDB (no block was streamed). In that case, we return IteratorBlockGCed and stop the stream.

Note that the open upper bound doesn't allow us to include blocks in the stream that are copied to the ImmutableDB after opening this iterator, as the bound of the iterator is fixed upon initialisation. These newly added blocks will be included in the stream because we will repeatedly open new ImmutableDB iterators (as long as we make progress).

Bounds checking

The VolatileDB is hash-based instead of point-based. While the bounds of a stream are points, we can simply check whether the hashes of the bounds match the hashes stored in the points.

The ImmutableDB is slot-based instead of point-based, which means that before we know whether a block in the ImmutableDB matches a given point, we must first read the block's hash corresponding to the point's slot from the (cached) on-disk indices, after which we can then verify whether it matches the hash of the point. This is important for the start and end bounds (both points) of a stream in case they are in the ImmutableDB (i.e., their slots are <= the tip of the ImmutableDB): we must first read the hashes corresponding to the bounds from the (cached) on-disk indices to be sure the range is valid. Note that these reads happen before the first call to iteratorNext.

Note that when streaming to an exclusive bound, the block corresponding to that bound (Point) must exist in the ChainDB.

The ImmutableDB will keep the on-disk indices of a chunk of blocks in memory after the first read so that the next lookup doesn't have to read from disk. When both bounds are in the same chunk, which will typically be the case, only checking the first bound will require disk reads, the second will be cached.

Costs

Opening an iterator has some costs:

  • When blocks have to be streamed from the ImmutableDB: as discussed in "Bounds checking", the hashes corresponding to the bounds have to be read from the (cached) on-disk indices.
  • When blocks have to be streamed both from the ImmutableDB and the VolatileDB, only the hash of the block corresponding to the lower bound will have to be read from the ImmutableDB upfront, as described in the previous bullet point. Note that the hash of the block corresponding to the upper bound does not have to be read from disk, since it will be in the VolatileDB, which means that we know its hash already from the in-memory index.

In summary:

  • Only streaming from the VolatileDB: 0 (cached) reads from disk upfront.
  • Only streaming from the ImmutableDB: 2 (cached) reads from disk upfront.
  • Streaming from both the ImmutableDB and the VolatileDB: 1 (cached) read from disk upfront.

Additionally, when we notice during streaming that a block is no longer in the VolatileDB, we try to see whether it can be streamed from the ImmutableDB instead. Opening such an iterator costs 2 (cached) reads from disk upfront. This can happen multiple times.

Exported for testing purposes

data IteratorEnv m blk Source #

Environment containing everything needed to implement iterators.

The main purpose of bundling these things in a separate record is to make it easier to test this code: no need to set up a whole ChainDB, just provide this record.

newIterator Source #

Arguments

∷ ∀ m blk b. (IOLike m, HasHeader blk, HasCallStack) 
IteratorEnv m blk 
→ (∀ r. (IteratorEnv m blk → m r) → m r)

Function with which the operations on the returned iterator should obtain their IteratorEnv. This function should check whether the ChainDB is still open or throw an exception otherwise. This makes sure that when we call iteratorNext, we first check whether the ChainDB is still open.

ResourceRegistry m 
BlockComponent blk b 
StreamFrom blk 
StreamTo blk 
→ m (Either (UnknownRange blk) (Iterator m blk b)) 

See stream.