sherlock

package module
v0.8.18 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 17, 2025 License: Apache-2.0 Imports: 36 Imported by: 3

README

Sherlock

Illustration of Sherlock Holmes and Watson in a train car, by Sidney Paget. From Arthur Conan Doyle's 1892 book 'The Adventure of Silver Blaze'

GoDoc Version Build Status Go Report Card Codecov

Relentless Metadata Inspector

Sherlock is a Go library that inspects a URL for any and all available metadata, pulling from whatever metadata formats are available, and returning it as an ActivityStreams 2.0 document.

The goal is to have a standard interface into all web content, regardless of competing data standards.

Supported Formats

ActivityPub/ActivityStreams

MicroFormats

Open Graph

In Progress

🚧 WebFinger

🚧 JSON-LD (Linked)

🚧 Twitter Metadata

🚧 Microdata

🚧 RDFa

🚧 oEmbed data provider

Using Sherlock
client := sherlock.NewClient()

// If you only have a URL, then pass it in to .Load()
result, err := client.Load("https://my-url-here")

// If you have already downloaded a file, then pass it to .Parse()
result, err := sherlock.ParseHTML("https://original-url", &bytes.Buffer)

Using Sherlock with Hannibal

Sherlock can also be used as an http client for Hannibal, the ActivityPub library for Go. This allows many other online resources to look like they're ActivityPub-enabled.

Documentation

Overview

Package sherlock is a library for extracting metadata from web pages. It uses as many methods as possible to extract page data, including: - ActivityStreams/JSON-LD - Open Graph - Microformats2

Coming Soon.. - HTML Meta Tags - oEmbed - JSON-LD - Twitter Cards?

Index

Constants

View Source
const ContentType = "Content-Type"

ContentType is the string used in the HTTP header to designate a MIME type

View Source
const ContentTypeActivityPub = "application/activity+json"

ContentTypeActivityPub is the standard MIME type for ActivityPub content

View Source
const ContentTypeAtom = "application/atom+xml"

ContentTypeAtom is the standard MIME Type for Atom Feeds

View Source
const ContentTypeForm = "application/x-www-form-urlencoded"

ContentTypeForm is the standard MIME Type for Form encoded content

View Source
const ContentTypeHTML = "text/html"

ContentTypeHTML is the standard MIME type for HTML content

View Source
const ContentTypeJSON = "application/json"

ContentTypeJSON is the standard MIME Type for JSON content

View Source
const ContentTypeJSONFeed = "application/feed+json"

ContentTypeJSONFeed is the standard MIME Type for JSON Feed content https://en.wikipedia.org/wiki/JSON_Feed

View Source
const ContentTypeJSONLD = "application/ld+json"

ContentTypeJSONLD is the standard MIME Type for JSON-LD content https://en.wikipedia.org/wiki/JSON-LD

View Source
const ContentTypeJSONResourceDescriptor = "application/jrd+json"

ContentTypeJSONResourceDescriptor is the standard MIME Type for JSON Resource Descriptor content which is used by WebFinger: https://datatracker.ietf.org/doc/html/rfc7033#section-10.2

View Source
const ContentTypePlain = "text/plain"

ContentTypePlain is the default plaintext MIME type

View Source
const ContentTypeRSS = "application/rss+xml"

ContentTypeRSS is the standard MIME Type for RSS Feeds

View Source
const ContentTypeXML = "application/xml"

ContentTypeXML is the standard MIME Type for XML content

View Source
const FormatActivityStream = "ACTIVITYSTREAM"
View Source
const FormatJSONFeed = "JSONFEED"
View Source
const FormatMicroFormats = "MICROFORMATS"
View Source
const FormatRSS = "RSS"
View Source
const HTTPHeaderAccept = "Accept"

HTTPHeaderAccept is the string used in the HTTP header to request a response be encoded as a MIME type

View Source
const HTTPHeaderCacheControl = "Cache-Control"
View Source
const HTTPHeaderLink = "Link"
View Source
const IdentifierTypeNone = "NONE"
View Source
const IdentifierTypeURL = "URL"
View Source
const IdentifierTypeUsername = "USERNAME"
View Source
const LinkRelationAlternate = "alternate"
View Source
const LinkRelationFeed = "feed"
View Source
const LinkRelationHub = "hub"
View Source
const LinkRelationIcon = "icon"
View Source
const LinkRelationSelf = "self"

Variables

This section is empty.

Functions

func AuthorizedFetch added in v0.8.0

func AuthorizedFetch(publicKeyID string, privateKey crypto.PrivateKey) remote.Option

AuthorizedFetch is a remote.Option that signs all outbound requests according to the ActivityPub "Authorized Fetch" convention: https://funfedi.dev/testing_tools/http_signatures/

func IsValidAddress added in v0.6.5

func IsValidAddress(address string) bool

IsValidAddress returns TRUE for all values that Sherlock THINKS it SHOULD be able to prorcess. This includes: @[email protected] and https://host.tld/username addresses. IMPORTANT: Just because this function returns TRUE does NOT mean that the address is valid. It just means that it looks like a valid format, but it will still need to be checked.

func ParseOEmbed

func ParseOEmbed(reader io.Reader, data mapof.Any)

Types

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client implements the hannibal/streams.Client interface, and is used to load JSON-LD documents from remote servers. The sherlock client maps additional meta-data into a standard ActivityStreams document.

func NewClient

func NewClient(options ...ClientOption) Client

NewClient returns a fully initialized Client object

func (Client) Delete added in v0.8.12

func (client Client) Delete(documentID string) error

func (Client) Load

func (client Client) Load(url string, options ...any) (streams.Document, error)

Load retrieves a document from a remote server and returns it as a streams.Document It uses either the "Actor" or "Document" methods of generating it ActivityStreams result. "Document" treats the URL as a single ActivityStreams document, translating OpenGraph, MicroFormats, and JSON-LD into an ActivityStreams equivalent. "Actor" treats the URL as an Actor, translating RSS, Atom, JSON, and MicroFormats feeds into an ActivityStream equivalent.

func (Client) Save added in v0.8.12

func (client Client) Save(document streams.Document) error

func (Client) SetRootClient added in v0.8.12

func (client Client) SetRootClient(rootClient streams.Client)

func (*Client) With added in v0.8.12

func (client *Client) With(options ...ClientOption)

type ClientOption added in v0.6.0

type ClientOption func(*Client)

func WithKeyPairFunc added in v0.8.12

func WithKeyPairFunc(fn KeyPairFunc) ClientOption

WithKeyPairFunc is an Option that sets the ActorGetter for a Client. This allows the Client to retrieve the public key ID and private key for a given URL only when needed, rather than performing expensive database queries ahead of time.

func WithUserAgent added in v0.6.0

func WithUserAgent(userAgent string) ClientOption

WithUserAgent is a ClientOption that sets the UserAgent property on the Client object

type Config added in v0.8.12

type Config struct {
	UserAgent        string // User-Agent string to send with every request
	DocumentType     int
	MaximumRedirects int
	RemoteOptions    []remote.Option // Additional options to pass to the remote library
	DefaultValue     map[string]any
}

type KeyPairFunc added in v0.8.12

type KeyPairFunc func() (publicKeyID string, privateKey crypto.PrivateKey)

type Option added in v0.8.12

type Option func(*Config)

func AsActor added in v0.6.0

func AsActor() Option

AsActor tells Sherlock to try parsing the URL as an Actor object.

func AsCollection added in v0.6.0

func AsCollection() Option

AsCollection tells Sherlock to try parsing the URL as a Collection object

func AsDocument added in v0.6.0

func AsDocument() Option

AsDocument tells Sherlock to try parsing the URL as a Document object

func WithDefaultValue added in v0.6.0

func WithDefaultValue(defaultValue map[string]any) Option

WithDefaultValue is an Option that sets the DefaultValue, which is used as the base value for all documents loaded by the Client.

func WithKeyPair added in v0.8.12

func WithKeyPair(publicKeyID string, privateKey crypto.PrivateKey) Option

WithKeyPair is an Option that set up the AuthorizedFetch remote middleware, which will sign all outbound requests according to the ActivityPub "Authorized Fetch" convention: https://funfedi.dev/testing_tools/http_signatures/

func WithMaximumRedirects added in v0.6.0

func WithMaximumRedirects(maximumRedirects int) Option

WithMaximumRedirects is an Option that sets the maximum number of redirects that the Client will follow when loading a document.

func WithRemoteOptions added in v0.6.0

func WithRemoteOptions(options ...remote.Option) Option

WithRemoteOptions is an Option that adds remote.Options which are passed to the remote library when making requests.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL