Class Archive

Nested Relationships

Nested Types

Class Documentation

class Archive

The Archive class to access content in a zim file.

The Archive is the main class to access content in a zim file. Archive are lightweight object and can be copied easily.

An Archive is read-only, and internal states (as caches) are protected from race-condition. Therefore, all methods of Archive are threadsafe.

Zim archives exist with two different namespace schemes: An old one and the new one. The method hasNewNamespaceScheme permit to know which namespace is used by the archive.

When using old namespace scheme:

  • User entries may be stored in different namespaces (historically A, I, J or -). So path of the entries contains the namespace as a “top level directory”: A/foo.html, I/image.png, …

  • All API taking or returning a path expect/will return a path with the namespace.

When using new namespace scheme:

  • User entries are always stored without namespace. (For information, they are stored in the same namespace C. Still consider there is no namespace as all API masks it) As there is no namespace, paths don’t contain it: foo.hmtl, image.png, …

  • All API taking or returning a path expect/will return a path without namespace.

This difference may seem complex to handle, but not so much. As all paths returned by API is consistent with paths expected, you simply have to use the path as it is. Forget about the namespace and if a path has it, simply consider it as a subdirectory. The only place it could be problematic is when you already have a path stored somewhere (bookmark, …) using a scheme and use it on an archive with another scheme. For this case, the method getEntryByPath has a compatibility layer trying to transform a path to the new scheme as a fallback if the entry is not found.

All methods of archive may throw an ZimFileFormatError if the file is invalid.

Public Functions

explicit Archive(const std::string &fname)

Archive constructor.

Construct an archive from a filename. The file is open readonly.

The filename is the “logical” path. So if you want to open a split zim file (foo.zimaa, foo.zimab, …) you must pass the foo.zim path.

Parameters:

fname – The filename to the file to open (utf8 encoded)

explicit Archive(int fd)

Archive constructor.

Construct an archive from a file descriptor. Fd is used only at Archive creation. Ownership of the fd is not taken and it must be closed by caller.

Note: This function is not available under Windows.

Parameters:

fd – The descriptor of a seekable file representing a ZIM archive

Archive(int fd, offset_type offset, size_type size)

Archive constructor.

Construct an archive from a descriptor of a file with an embedded ZIM archive inside. Fd is used only at Archive creation. Ownership of the fd is not taken and it must be closed by caller.

Note: This function is not available under Windows.

Parameters:
  • fd – The descriptor of a seekable file with a continuous segment representing a complete ZIM archive.

  • offset – The offset of the ZIM archive relative to the beginning of the file (rather than the current position associated with fd).

  • size – The size of the ZIM archive.

explicit Archive(FdInput fd)

Archive constructor.

Construct an archive from a descriptor of a file with an embedded ZIM archive inside. Fd is used only at Archive creation. Ownership of the fd is not taken and it must be closed by caller.

Note: This function is not available under Windows.

Parameters:

fd – A FdInput (tuple) containing the fd (int), offset (offset_type) and size (size_type) referencing a continuous segment representing a complete ZIM archive.

explicit Archive(const std::vector<FdInput> &fds)

Archive constructor.

Construct an archive from several file descriptors. Each part may be embedded in a file. Fds are used only at Archive creation. Ownership of the fds is not taken and they must be closed by caller. Fds (int) can be the same between FdInput if the parts belong to the same file.

Note: This function is not available under Windows.

Parameters:

fds – A vector of FdInput (tuple) containing the fd (int), offset (offset_type) and size (size_type) referencing a series of segments representing a complete ZIM archive.

const std::string &getFilename() const

Return the filename of the zim file.

Return the filename as passed to the constructor (So foo.zim).

Returns:

The logical filename of the archive.

size_type getFilesize() const

Return the logical archive size.

Return the size of the full archive, not the size of the file on the fs. If the zim is split, return the sum of the size of the parts.

Returns:

The logical size of the archive.

entry_index_type getAllEntryCount() const

Return the number of entries in the archive.

Return the total number of entries in the archive, including internal entries created by libzim itself, metadata, indexes, …

Returns:

the number of all entries in the archive.

entry_index_type getEntryCount() const

Return the number of user entries in the archive.

If the notion of “user entries” doesn’t exist in the zim archive, returns getAllEntryCount().

Returns:

the number of user entries in the archive.

entry_index_type getArticleCount() const

Return the number of articles in the archive.

The definition of “article” depends of the zim archive. On recent archives, this correspond to all entries marked as “FRONT_ARTICLE” at creaton time. On old archives, this corresponds to all “text/html*” entries.

Returns:

the number of articles in the archive.

entry_index_type getMediaCount() const

Return the number of media in the archive.

This definition of “media” is based on the mimetype.

Returns:

the number of media in the archive.

Uuid getUuid() const

The uuid of the archive.

Returns:

the uuid of the archive.

std::string getMetadata(const std::string &name) const

Get a specific metadata content.

Get the content of a metadata stored in the archive.

Parameters:

name – The name of the metadata.

Throws:

EntryNotFound – If the metadata is not in the arcthive.

Returns:

The content of the metadata.

Item getMetadataItem(const std::string &name) const

Get a specific metadata item.

Get the item associated to a metadata stored in the archive.

Parameters:

name – The name of the metadata.

Throws:

EntryNotFound – If the metadata in not in the archive.

Returns:

The item associated to the metadata.

std::vector<std::string> getMetadataKeys() const

Get the list of metadata stored in the archive.

Returns:

The list of metadata in the archive.

Item getIllustrationItem(unsigned int size = 48) const

Get the illustration item of the archive.

Illustration is a icon for the archive that can be used in catalog and so to illustrate the archive.

Parameters:

size – The size (width and height) of the illustration to get. Default to 48 (48x48px icon)

Throws:

EntryNotFound – If no illustration item can be found.

Returns:

The illustration item.

std::set<unsigned int> getIllustrationSizes() const

Return a list of available sizes (width) for the illustations in the archive.

Illustration is an icon for the archive that can be used in catalog and elsewehere to illustrate the archive. An Archive may contains several illustrations with different size. This method allows to know which illustration are in the archive (by size: width)

Returns:

A set of size.

Entry getEntryByPath(entry_index_type idx) const

Get an entry using its “path” index.

Use the index of the entry to get the idx’th entry (entry being sorted by path).

Parameters:

idx – The index of the entry.

Throws:

std::out_of_range – If idx is greater than the number of entry.

Returns:

The Entry.

Entry getEntryByPath(const std::string &path) const

Get an entry using a path.

Search an entry in the zim, using its path. On archive with new namespace scheme, path must not contain the namespace. On archive without new namespace scheme, path must contain the namespace. A compatibility layer exists to accept “old” path on new archive (and the opposite) to help using saved path (bookmark) on new archive. On new archive, we first search the path in C namespace, then try to remove the potential namespace in path and search again in C namespace with path “without namespace”. On old archive, we first assume path contains a namespace and if not (or no entry found) search in namespaces A, I, J and -.

Parameters:

path – The entry’s path.

Throws:

EntryNotFound – If no entry has the asked path.

Returns:

The Entry.

Entry getEntryByTitle(entry_index_type idx) const

Get an entry using its “title” index.

Use the index of the entry to get the idx’th entry (entry being sorted by title).

Parameters:

idx – The index of the entry.

Throws:

std::out_of_range – If idx is greater than the number of entry.

Returns:

The Entry.

Entry getEntryByTitle(const std::string &title) const

Get an entry using a title.

Get an entry using its title.

Parameters:

title – The entry’s title.

Throws:

EntryNotFound – If no entry has the asked title.

Returns:

The Entry.

Entry getEntryByClusterOrder(entry_index_type idx) const

Get an entry using its “cluster” index.

Use the index of the entry to get the idx’th entry The actual order of the entry is not really specified. It is infered from the internal way the entry are stored.

This method is probably not relevent and is provided for completeness. You should probably use a iterator using the efficientOrder.

Parameters:

idx – The index of the entry.

Throws:

std::out_of_range – If idx is greater than the number of entry.

Returns:

The Entry.

Entry getMainEntry() const

Get the main entry of the archive.

Throws:

EntryNotFound – If no main entry has been specified in the archive.

Returns:

The Main entry.

Entry getRandomEntry() const

Get a random entry.

The entry is picked randomly from the front artice list.

Throws:

EntryNotFound – If no valid random entry can be found.

Returns:

A random entry.

inline bool hasEntryByPath(const std::string &path) const

Check in an entry has path in the archive.

The path follows the same requirement than getEntryByPath.

Parameters:

path – The entry’s path.

Returns:

True if the path in the archive, false else.

inline bool hasEntryByTitle(const std::string &title) const

Check in an entry has title in the archive.

Parameters:

title – The entry’s title.

Returns:

True if the title in the archive, false else.

bool hasMainEntry() const

Check if archive has a main entry

Returns:

True if the archive has a main entry.

bool hasIllustration(unsigned int size = 48) const

Check if archive has a favicon entry

Parameters:

size – The size (width and height) of the illustration to check. Default to 48 (48x48px icon)

Returns:

True if the archive has a corresponding illustration entry. (Always True if the archive has no illustration, but a favicon)

bool hasFulltextIndex() const

Check if the archive has a fulltext index.

Returns:

True if the archive has a fulltext index

bool hasTitleIndex() const

Check if the archive has a title index.

Returns:

True if the archive has a title index

EntryRange<EntryOrder::pathOrder> iterByPath() const

Get a “iterable” by path order.

This method allow to iterate on all user entries using a path order. If the notion of “user entries” doesn’t exists (for old zim archive), this iterate on all entries in the zim file.

for(auto& entry:archive.iterByPath()) {
   ...
}
Returns:

A range on all the entries, in path order.

EntryRange<EntryOrder::titleOrder> iterByTitle() const

Get a “iterable” by title order.

This method allow to iterate on all articles using a title order. The definition of “article” depends of the zim archive. On recent archives, this correspond to all entries marked as “FRONT_ARTICLE” at creaton time. On old archives, this correspond to all entries in ‘A’ namespace. Few archives may have been created without namespace but also without specific article listing. In this case, this iterate on all user entries.

for(auto& entry:archive.iterByTitle()) {
   ...
}
Returns:

A range on all the entries, in title order.

EntryRange<EntryOrder::efficientOrder> iterEfficient() const

Get a “iterable” by a efficient order.

This method allow to iterate on all user entries using a effictient order. If the notion of “user entries” doesn’t exists (for old zim archive), this iterate on all entries in the zim file.

for(auto& entry:archive.iterEfficient()) {
   ...
}
Returns:

A range on all the entries, in efficitent order.

EntryRange<EntryOrder::pathOrder> findByPath(std::string path) const

Find a range of entries starting with path.

When using new namespace scheme, path must not contain the namespace (foo.html). When using old namespace scheme, path must contain the namespace (A/foo.html). Contrary to getEntryByPath, there is no compatibility layer, path must follow the archive scheme.

Parameters:

path – The path prefix to search for.

Returns:

A range starting from the first entry starting with path and ending past the last entry. If no entry starts with path, begin == end.

EntryRange<EntryOrder::titleOrder> findByTitle(std::string title) const

Find a range of entry starting with title.

When using old namespace scheme, entry title is search in A namespace.

Parameters:

title – The title prefix to search for.

Returns:

A range starting from the first entry starting with title and ending past the last entry. If no entry starts with title, begin == end.

bool hasChecksum() const

hasChecksum.

The checksum is not the checksum of the file. It is an internal checksum stored in the zim file.

Returns:

True if the archive has a checksum.

std::string getChecksum() const

getChecksum.

Returns:

the checksum stored in the archive. If the archive has no checksum return an empty string.

bool check() const

Check that the zim file is valid (in regard to its checksum).

If the zim file has no checksum return false.

Returns:

True if the file is valid.

bool checkIntegrity(IntegrityCheck checkType)

Check the integrity of the zim file.

Run different type of checks to verify the zim file is valid (in regard to the zim format). This may be time consuming.

Returns:

True if the file is valid.

bool isMultiPart() const

Check if the file is split in the filesystem.

Returns:

True if the archive is split in different file (foo.zimaa, foo.zimbb).

bool hasNewNamespaceScheme() const

Get if the zim archive uses the new namespace scheme.

Recent zim file use the new namespace scheme.

On user perspective, it means that :

  • On old namespace scheme : . All entries are accessible, either using getEntryByPath with a specific namespace or simply iterating over the entries (with iter* methods). . Entry’s path has namespace included (“A/foo.html”)

  • On new namespace scheme : . Only the “user” entries are accessible with getEntryByPath and iter* methods. To access metadatas, use getMetadata method. . Entry’s path do not contains namespace (“foo.html”)

inline std::shared_ptr<FileImpl> getImpl() const

Get a shared ptr on the FileImpl

template<EntryOrder order>
class EntryRange

A range of entries in an Archive.

EntryRange represents a range of entries in a specific order.

An EntryRange can’t be modified is consequently threadsafe.

Public Functions

inline explicit EntryRange(const std::shared_ptr<FileImpl> file, entry_index_type begin, entry_index_type end)
inline iterator<order> begin() const
inline iterator<order> end() const
inline int size() const
inline EntryRange<order> offset(int start, int maxResults) const
template<EntryOrder order>
class iterator

An iterator on an Archive.

Archive::iterator stores an internal state which is not protected from race-condition. It is not threadsafe.

An EntryRange can’t be modified and is consequently threadsafe.

Be aware that the referenced/pointed Entry is generated and stored in the iterator itself. Once the iterator is destructed or incremented/decremented, you must NOT use the Entry.

Public Types

using iterator_category = std::input_iterator_tag
using value_type = Entry
using pointer = Entry*
using reference = Entry&

Public Functions

inline explicit iterator(const std::shared_ptr<FileImpl> file, entry_index_type idx)
inline iterator(const iterator<order> &other)
inline bool operator==(const iterator<order> &it) const
inline bool operator!=(const iterator<order> &it) const
iterator<order> &operator=(iterator<order> &&it) = default
inline iterator<order> &operator=(iterator<order> &it)
inline iterator<order> &operator++()
inline iterator<order> operator++(int)
inline iterator<order> &operator--()
inline iterator<order> operator--(int)
inline const Entry &operator*() const
inline const Entry *operator->() const