Documentation
¶
Overview ¶
Package ia contains utilities for working with files from the Internet Archive.
Index ¶
- Constants
- func DecodeDigest(digest string) (*[20]byte, error)
- func DownloadFile(url, filename string) error
- func DownloadFileChecked(url, filename string, sha1Sum []byte) error
- func DownloadTorrents(ids []string, dir string) error
- func GetTimemap(pageURL string, options *TimemapOptions) ([][]string, error)
- func NewReadValidator(r io.Reader, name string, md5Sum, sha1Sum, crc32Sum []byte) io.Reader
- func PageURL(url, timestamp string) string
- func Save(pageURL string, options *SaveOptions) error
- func Search(query string) ([]string, error)
- func Validate(dir string) error
- func ValidateFile(filename string, md5Sum, sha1Sum, crc32Sum []byte) error
- type FileMeta
- type ItemMeta
- type SaveOptions
- type TimemapOptions
Constants ¶
View Source
const TimestampFormat = "20060102150405"
Variables ¶
This section is empty.
Functions ¶
func DecodeDigest ¶
DecodeDigest decodes a base32-encoded SHA-1 digest.
func DownloadFile ¶
func DownloadFileChecked ¶
func DownloadTorrents ¶
DownloadTorrents downloads the named Internet Archive items via torrent.
func GetTimemap ¶
func GetTimemap(pageURL string, options *TimemapOptions) ([][]string, error)
GetTimemap gets a list of Internet Archive captures of the given URL.
func NewReadValidator ¶
func Save ¶
func Save(pageURL string, options *SaveOptions) error
func ValidateFile ¶
Types ¶
type FileMeta ¶
type FileMeta struct {
Name string `xml:"name,attr"` // filename, relative to root
Source string `xml:"source,attr"` // "original", "metadata", or "derivative"
Format string `xml:"format"` // e.g., "Text", "Metadata", "Unknown"
Original string `xml:"original"`
BTIH jsonutil.Hex `xml:"btih"` // BitTorrent info-hash
ModTime timefmt.UnixSec `xml:"mtime"`
Size int64 `xml:"size"`
MD5 jsonutil.Hex `xml:"md5"`
CRC32 jsonutil.Hex `xml:"crc32"`
SHA1 jsonutil.Hex `xml:"sha1"`
Length float64 `xml:"length"` // audio duration
Height int `xml:"height"` // image height
Width int `xml:"width"` // image width
Private bool `xml:"private"`
}
FileMeta contains file metadata listed in the *_files.xml file in the root of an item. This file is excluded for torrent downloads.
func ReadFileMeta ¶
func (*FileMeta) OpenValidator ¶
func (fm *FileMeta) OpenValidator(dir string) (io.ReadCloser, error)
type ItemMeta ¶
type ItemMeta struct {
Identifier string `xml:"identifier"`
Collections []string `xml:"collection"`
Description string `xml:"description"`
Mediatype string `xml:"mediatype"` // e.g., "software"
Subject string `xml:"subject"`
Title string `xml:"title"`
Uploader string `xml:"uploader"`
Publicdate string `xml:"publicdate"` // "2006-01-02 15:04:05" format
Addeddate string `xml:"addeddate"` // "2006-01-02 15:04:05" format
Curation string `xml:"curation"`
BackupLocation string `xml:"backup_location"` // removed from meta in April 2020
}
ItemMeta contains item metadata in the *_meta.xml file in the root of an item.
func ReadItemMeta ¶
type SaveOptions ¶
type TimemapOptions ¶
type TimemapOptions struct {
MatchPrefix bool // whether url is a prefix (* wildcard is appended)
Collapse string // field to collapse by; earliest captures with unique field is kept
Fields []string // e.g., urlkey,timestamp,endtimestamp,original,mimetype,statuscode,digest,redirect,robotflags,length,offset,filename,groupcount,uniqcount
Limit int // e.g., 100000
}
TimemapOptions contains options for a timemap API call.
Click to show internal directories.
Click to hide internal directories.