Documentation
¶
Overview ¶
Package internal provides caching functionality for content extraction results. It implements a thread-safe LRU cache with TTL support to improve performance for repeated extractions of the same content.
Package internal provides centralized constant definitions for internal use.
Package internal provides character encoding detection and conversion functionality. It supports 15+ encodings including Unicode variants, Western European, and East Asian character sets, with intelligent auto-detection capabilities.
Package internal provides implementation details for the cybergodev/html library. It contains content extraction, table processing, and text manipulation functionality that is not part of the public API.
Package internal provides pooled resources for memory allocation optimization.
Package internal provides implementation details for the cybergodev/html library. This file contains the Scorer interface and default implementation for content scoring.
Package internal provides unsafe utility functions for zero-allocation conversions.
Package internal provides URL parsing and resolution utilities.
Index ¶
- Constants
- Variables
- func BytesToString(b []byte) string
- func CalculateContentDensity(n *html.Node) float64
- func CleanContentNode(node *html.Node) *html.Node
- func CleanText(text string, whitespaceRegex *regexp.Regexp) string
- func ConvertToUTF8(data []byte, charset string) ([]byte, error)
- func CountChildElements(n *html.Node, tag string) int
- func CountTags(n *html.Node) int
- func DetectAndConvertToUTF8(data []byte) ([]byte, string, error)
- func DetectAndConvertToUTF8String(data []byte, forcedEncoding string) (string, string, error)
- func DetectAudioType(url string) string
- func DetectCharsetFromBytes(data []byte) string
- func DetectVideoType(url string) string
- func ExtractBaseFromURL(url string) string
- func ExtractDomain(url string) string
- func ExtractTextWithStructureAndImages(node *html.Node, sb *strings.Builder, imageCounter *int, linkCounter *int, ...)
- func FindElementByTag(doc *html.Node, tagName string) *html.Node
- func GetBuffer() *bytes.Buffer
- func GetBuilder() *strings.Builder
- func GetHash128() hash.Hash
- func GetLinkDensity(node *html.Node) float64
- func GetNamespacePrefix(tag string) string
- func GetTextContent(node *html.Node) string
- func GetTextLength(node *html.Node) int
- func GetTransformBuffer() *[]byte
- func IsBlockElement(tag string) bool
- func IsDifferentDomain(baseURL, targetURL string) bool
- func IsExternalURL(url string) bool
- func IsInlineElement(tag string) bool
- func IsKnownInlineNamespacePrefix(prefix string) bool
- func IsNamespaceTag(tag string) bool
- func IsNonContentElement(tag string) bool
- func IsParagraphLevelBlockElement(tag string) bool
- func IsValidURL(url string) bool
- func IsVideoURL(url string) bool
- func MatchesPattern(value string, patterns map[string]bool) bool
- func NormalizeBaseURL(baseURL string) string
- func PutBuffer(buf *bytes.Buffer)
- func PutBuilder(sb *strings.Builder)
- func PutHash128(h hash.Hash)
- func PutTransformBuffer(buf *[]byte)
- func RemoveTagContent(content, tag string) string
- func ReplaceHTMLEntities(text string) string
- func ResolveURL(baseURL, relativeURL string) string
- func SanitizeHTML(htmlContent string) string
- func SanitizeHTMLWithAudit(htmlContent string, audit AuditRecorder) string
- func ScoreAttributes(n *html.Node) int
- func ScoreContentNode(node *html.Node) int
- func SelectBestCandidate(candidates map[*html.Node]int) *html.Node
- func SetPoolLogger(logger func(format string, args ...any))
- func ShouldRemoveElement(n *html.Node) bool
- func ShouldTreatAsBlockElement(node *html.Node) bool
- func ShouldTreatNamespaceTagAsInline(node *html.Node) bool
- func StringToBytes(s string) []byte
- func TableProcessor() *table.Processor
- func WalkNodes(node *html.Node, fn func(*html.Node) bool)
- type AuditRecorder
- type Cache
- type DefaultScorer
- type EncodingDetector
- func (ed *EncodingDetector) DetectAndConvert(data []byte) ([]byte, string, error)
- func (ed *EncodingDetector) DetectCharset(data []byte) string
- func (ed *EncodingDetector) DetectCharsetBasic(data []byte) string
- func (ed *EncodingDetector) DetectCharsetSmart(data []byte) EncodingMatch
- func (ed *EncodingDetector) SetMaxSampleSize(size int) *EncodingDetector
- func (ed *EncodingDetector) ToUTF8(data []byte, charset string) ([]byte, error)
- type EncodingMatch
- type NoOpAuditRecorder
- type Scorer
- type ScoringConfig
Examples ¶
Constants ¶
const ( // URL validation limits MaxURLLength = 2000 // Maximum URL length MaxDataURILength = 100000 // Maximum data URL length (100KB) )
const (
DefaultCacheCleanupInterval = 5 * time.Minute
)
Default cache cleanup configuration
Variables ¶
var BufferPool = sync.Pool{ New: func() any { return bytes.NewBuffer(make([]byte, 0, bufferPoolInitialCapacity)) }, }
BufferPool is a sync.Pool for bytes.Buffer instances. Use this for functions that work with byte slices to reduce allocations.
For most use cases, prefer the helper functions GetBuffer() and PutBuffer():
buf := internal.GetBuffer() defer internal.PutBuffer(buf) buf.Grow(estimatedSize) // ... use buf ... return buf.Bytes()
Direct pool access is also available for advanced use cases:
bufPtr := internal.BufferPool.Get().(*bytes.Buffer)
buf := *bufPtr
defer func() {
buf.Reset()
internal.BufferPool.Put(bufPtr)
}()
var BuilderPool = sync.Pool{ New: func() any { sb := &strings.Builder{} sb.Grow(builderPoolInitialCapacity) return sb }, }
BuilderPool is a sync.Pool for strings.Builder instances. Use this for functions that build strings incrementally to reduce allocations.
For most use cases, prefer the helper functions GetBuilder() and PutBuilder():
sb := internal.GetBuilder() defer internal.PutBuilder(sb) sb.Grow(estimatedSize) // ... use sb ... return sb.String()
Direct pool access is also available for advanced use cases:
sbPtr := internal.BuilderPool.Get().(*strings.Builder)
sb := *sbPtr
defer func() {
sb.Reset()
internal.BuilderPool.Put(sbPtr)
}()
Hash128Pool is a sync.Pool for FNV-128a hash instances. Use this for cache key generation to avoid repeated allocations.
Usage pattern:
h := internal.GetHash128() defer internal.PutHash128(h) h.Write(data) var buf [16]byte sum := h.Sum(buf[:0])
var TransformBufferPool = sync.Pool{ New: func() any { buf := make([]byte, 0, 8192) return &buf }, }
TransformBufferPool is a sync.Pool for byte slices used in encoding transformation. These buffers are used for charset conversion operations.
Functions ¶
func BytesToString ¶ added in v1.3.0
BytesToString converts a byte slice to string without memory allocation. The returned string shares memory with the input slice.
WARNING: The caller must ensure the byte slice is not modified after this call. Modifying the slice will cause undefined behavior in the returned string.
Use this only when the byte slice is guaranteed to remain unchanged, such as when converting read-only data or when the result has a short lifetime.
func CalculateContentDensity ¶
CalculateContentDensity calculates text-to-tag ratio. This is the exported version that uses the internal calculateDensityFromMetrics.
func CleanContentNode ¶
CleanContentNode removes non-content elements from the node tree.
func ConvertToUTF8 ¶ added in v1.2.0
ConvertToUTF8 is a convenience function that converts data to UTF-8
func CountChildElements ¶
CountChildElements counts child elements of specific tag type.
func DetectAndConvertToUTF8 ¶ added in v1.2.0
DetectAndConvertToUTF8 is a convenience function that detects charset and converts to UTF-8
func DetectAndConvertToUTF8String ¶ added in v1.2.0
DetectAndConvertToUTF8String detects encoding and converts to UTF-8 string. If forcedEncoding is not empty, it will use that encoding instead of auto-detection. Returns a UTF-8 string and the detected/used encoding. Uses safe string conversion to ensure memory safety.
func DetectAudioType ¶
DetectAudioType detects the audio MIME type from a URL
func DetectCharsetFromBytes ¶ added in v1.2.0
DetectCharsetFromBytes is a convenience function that detects charset from byte data
func DetectVideoType ¶
DetectVideoType detects the video MIME type from a URL
func ExtractBaseFromURL ¶ added in v1.2.0
ExtractBaseFromURL extracts the base URL (scheme://domain/) from a URL. Returns the base URL including trailing slash, or empty string for invalid URLs.
func ExtractDomain ¶ added in v1.2.0
ExtractDomain extracts the domain from a URL. Returns the domain portion (scheme://domain) or empty string for invalid URLs.
func ExtractTextWithStructureAndImages ¶
func ExtractTextWithStructureAndImages(node *html.Node, sb *strings.Builder, imageCounter *int, linkCounter *int, tableFormat string)
ExtractTextWithStructureAndImages extracts text content from an HTML node tree while preserving document structure (headings, paragraphs, lists, tables).
func GetBuffer ¶ added in v1.3.0
GetBuffer gets a bytes.Buffer from the pool. The returned buffer has been reset and is ready for use. Call PutBuffer when done to return it to the pool.
IMPORTANT: Callers MUST ensure PutBuffer is called even on error paths. Use defer immediately after GetBuffer to guarantee cleanup:
buf := internal.GetBuffer() defer internal.PutBuffer(buf) // ... use buf ...
Failure to return the buffer to the pool will not cause memory leaks (the GC will collect it), but will reduce the effectiveness of the pool.
func GetBuilder ¶ added in v1.3.0
GetBuilder gets a strings.Builder from the pool. The returned builder has been reset and is ready for use. Call PutBuilder when done to return it to the pool.
IMPORTANT: Callers MUST ensure PutBuilder is called even on error paths. Use defer immediately after GetBuilder to guarantee cleanup:
sb := internal.GetBuilder() defer internal.PutBuilder(sb) // ... use sb ...
Failure to return the builder to the pool will not cause memory leaks (the GC will collect it), but will reduce the effectiveness of the pool.
func GetHash128 ¶ added in v1.3.0
GetHash128 gets an FNV-128a hasher from the pool. The returned hasher has been reset and is ready for use. Call PutHash128 when done to return it to the pool.
func GetLinkDensity ¶
func GetNamespacePrefix ¶ added in v1.3.0
GetNamespacePrefix extracts the namespace prefix from a namespaced tag. For "ix:nonnumeric", it returns "ix".
func GetTextContent ¶
Example ¶
ExampleGetTextContent demonstrates the GetTextContent function with HTML entities.
html := `<p> © 2025 — All rights reserved </p>` doc, _ := stdxhtml.Parse(strings.NewReader(html)) result := GetTextContent(doc) fmt.Println(result)
Output: © 2025 — All rights reserved
func GetTextLength ¶
func GetTransformBuffer ¶ added in v1.3.0
func GetTransformBuffer() *[]byte
GetTransformBuffer gets a byte slice from the transform buffer pool. The returned slice has zero length but retained capacity.
func IsBlockElement ¶
IsBlockElement returns true if the tag is a known block-level element.
func IsDifferentDomain ¶ added in v1.2.0
IsDifferentDomain checks if two URLs have different domains. Returns false if either URL is not external.
func IsExternalURL ¶
IsExternalURL checks if a URL is an external HTTP(S) URL or protocol-relative URL.
func IsInlineElement ¶ added in v1.2.0
IsInlineElement returns true if the tag is a known inline element. Inline elements should not add newlines or paragraph spacing.
func IsKnownInlineNamespacePrefix ¶ added in v1.3.0
IsKnownInlineNamespacePrefix checks if the prefix is a known inline namespace prefix.
func IsNamespaceTag ¶ added in v1.3.0
IsNamespaceTag checks if a tag is a namespaced tag (contains ':'). Examples: ix:nonnumeric, xbrl:value, dei:CityAreaCode
func IsNonContentElement ¶
IsNonContentElement returns true if the tag is typically not part of main content.
func IsParagraphLevelBlockElement ¶ added in v1.3.0
IsParagraphLevelBlockElement returns true if the element is a block element that should be separated by paragraph spacing (double newlines) in the output.
Paragraph-level block elements create visual separation with blank lines in Markdown:
- Text containers: p, div, pre, blockquote
- Headings: h1-h6
- Semantic sections: article, section, main, figure, figcaption, address
- Lists: ul, ol, dl
- Tables: table
- Forms: fieldset
- Interactive: details, summary, dialog
- Media: canvas
Block elements WITHOUT paragraph spacing (treated as inline blocks):
- List items: li, dt, dd
- Table structure: thead, tbody, tfoot, tr, td, th
- Self-closing: hr
- Structural: body, html, head
- Semantic (non-content): nav, aside, header, footer, form
func IsValidURL ¶ added in v1.2.0
IsValidURL checks if a URL is valid and safe for processing. This is a centralized URL validation function with size limits for security.
func IsVideoURL ¶
IsVideoURL checks if a URL is a video based on extension or embed pattern
func MatchesPattern ¶
MatchesPattern checks if value contains any pattern from the map with word boundaries. This is exported for testing purposes.
func NormalizeBaseURL ¶ added in v1.2.0
NormalizeBaseURL ensures a base URL ends with a slash. Returns empty string for non-HTTP URLs (javascript:, data:, mailto:, etc.).
func PutBuffer ¶ added in v1.3.0
PutBuffer returns a bytes.Buffer to the pool. The buffer is reset before being returned to the pool. It is safe to call PutBuffer with a nil pointer (no-op).
func PutBuilder ¶ added in v1.3.0
PutBuilder returns a strings.Builder to the pool. The builder is reset before being returned to the pool. It is safe to call PutBuilder with a nil pointer (no-op).
func PutHash128 ¶ added in v1.3.0
PutHash128 returns an FNV-128a hasher to the pool. The hasher is reset before being returned to the pool. It is safe to call PutHash128 with a nil pointer (no-op).
func PutTransformBuffer ¶ added in v1.3.0
func PutTransformBuffer(buf *[]byte)
PutTransformBuffer returns a byte slice to the transform buffer pool. The slice is reset to zero length before being returned. It is safe to call PutTransformBuffer with a nil pointer (no-op).
func RemoveTagContent ¶
RemoveTagContent removes all occurrences of the specified HTML tag and its content. This function uses string-based parsing as the primary method to handle edge cases like unclosed tags, malformed HTML, and to preserve original character case.
func ReplaceHTMLEntities ¶
ReplaceHTMLEntities replaces HTML entities with their corresponding characters. It handles both named entities (like &, ) and numeric entities (like A, A). For unknown entities, it falls back to the standard library's html.UnescapeString. Optimized with a fast path for the most common entities.
Example ¶
ExampleReplaceHTMLEntities demonstrates the ReplaceHTMLEntities function.
input := " © 2025 — Test €100" result := ReplaceHTMLEntities(input) fmt.Println(result)
Output: © 2025 — Test €100
func ResolveURL ¶ added in v1.2.0
ResolveURL resolves a relative URL against a base URL. Handles absolute URLs, protocol-relative URLs, absolute paths, and relative paths.
func SanitizeHTML ¶
func SanitizeHTMLWithAudit ¶ added in v1.3.0
func SanitizeHTMLWithAudit(htmlContent string, audit AuditRecorder) string
SanitizeHTMLWithAudit sanitizes HTML content and records security events. The audit recorder receives events for blocked tags, attributes, and URLs.
func ScoreAttributes ¶
ScoreAttributes calculates a score based on element attributes. This function delegates to the default Scorer implementation.
func ScoreContentNode ¶
ScoreContentNode calculates a relevance score for content extraction. Higher scores indicate more likely main content. Negative scores suggest non-content elements. This function delegates to the default Scorer implementation.
func SetPoolLogger ¶ added in v1.3.0
SetPoolLogger sets a logger function for pool corruption warnings. Pass nil to disable logging. This is a no-op if poolDebug is false. The logger function should be thread-safe.
func ShouldRemoveElement ¶
ShouldRemoveElement determines if a node should be removed from the content tree. This function delegates to the default Scorer implementation.
func ShouldTreatAsBlockElement ¶ added in v1.3.0
ShouldTreatAsBlockElement dynamically determines if an unknown/custom tag should be treated as a block-level element based on its structure and content. This enables proper handling of custom tag formats like SEC documents.
func ShouldTreatNamespaceTagAsInline ¶ added in v1.3.0
ShouldTreatNamespaceTagAsInline determines if a namespaced tag should be treated as an inline element based on context, content, and namespace.
func StringToBytes ¶ added in v1.3.0
StringToBytes converts a string to a byte slice without memory allocation. The returned slice shares memory with the original string.
WARNING: The returned slice MUST NOT be modified. Go strings are immutable, and modifying the returned slice would violate this immutability, potentially causing undefined behavior in other code holding references to the string.
Use this only for short-lived operations where the string is guaranteed to remain in scope, such as passing strings to functions that accept []byte.
func TableProcessor ¶ added in v1.3.0
TableProcessor returns the table processor with default accessor and walker.
Types ¶
type AuditRecorder ¶ added in v1.3.0
type AuditRecorder interface {
// RecordBlockedTag records when a dangerous tag is removed.
RecordBlockedTag(tag string)
// RecordBlockedAttr records when a dangerous attribute is removed.
RecordBlockedAttr(attr, value string)
// RecordBlockedURL records when a dangerous URL is blocked.
RecordBlockedURL(url, reason string)
}
AuditRecorder defines the interface for recording security audit events. This interface is used internally to decouple the sanitization code from the main audit implementation.
type Cache ¶
type Cache struct {
// contains filtered or unexported fields
}
Cache is a thread-safe LRU cache with optional TTL support. It uses a doubly-linked list for LRU ordering with sentinel nodes to simplify edge case handling.
TTL Behavior:
- ttl > 0: Entries expire after the specified duration
- ttl = 0: Entries never expire based on time (only LRU eviction)
- ttl < 0: Treated as 0 (no time-based expiration)
Thread Safety: All public methods are safe for concurrent use. Get() uses a write lock to prevent TOCTOU race conditions.
func NewCache ¶
NewCache creates a new LRU cache with the specified maximum entries and TTL. If maxEntries is 0 or negative, the cache is disabled (Set becomes a no-op). If ttl is 0 or negative, entries never expire based on time.
func (*Cache) Len ¶ added in v1.3.0
Len returns the current number of entries in the cache. This is useful for monitoring and debugging.
func (*Cache) StartCleanup ¶ added in v1.3.0
func (c *Cache) StartCleanup(interval time.Duration) context.CancelFunc
StartCleanup starts a background goroutine that periodically cleans up expired entries. This is useful when TTL is enabled and the cache receives many one-time accesses, as expired entries would otherwise only be cleaned when accessed or during eviction.
The cleanup goroutine runs at the specified interval until StopCleanup is called or the cache is garbage collected. If interval is 0, DefaultCacheCleanupInterval is used.
This method is idempotent - calling it multiple times has no additional effect.
IMPORTANT: While runtime.SetFinalizer ensures cleanup when the Cache is garbage collected, it is still recommended to call StopCleanup() explicitly for deterministic resource release, especially in long-running applications.
Usage:
cache := NewCache(1000, time.Hour) cache.StartCleanup(5 * time.Minute) defer cache.StopCleanup()
func (*Cache) StopCleanup ¶ added in v1.3.0
func (c *Cache) StopCleanup()
StopCleanup stops the background cleanup goroutine if it was started. It is safe to call this method multiple times. This method also clears the finalizer to prevent double cleanup.
type DefaultScorer ¶ added in v1.3.0
type DefaultScorer struct {
// contains filtered or unexported fields
}
DefaultScorer is the default implementation of the Scorer interface.
func NewDefaultScorer ¶ added in v1.3.0
func NewDefaultScorer() *DefaultScorer
NewDefaultScorer creates a new DefaultScorer with the default configuration.
func NewDefaultScorerWithConfig ¶ added in v1.3.0
func NewDefaultScorerWithConfig(config *ScoringConfig) *DefaultScorer
NewDefaultScorerWithConfig creates a new DefaultScorer with custom configuration. If config is nil, the default configuration is used.
func (*DefaultScorer) Score ¶ added in v1.3.0
func (s *DefaultScorer) Score(node *html.Node) int
Score calculates a relevance score for a content node.
func (*DefaultScorer) ScoreAttributes ¶ added in v1.3.0
func (s *DefaultScorer) ScoreAttributes(n *html.Node) int
ScoreAttributes calculates a score based on element attributes. This is the public version for external use.
func (*DefaultScorer) ShouldRemove ¶ added in v1.3.0
func (s *DefaultScorer) ShouldRemove(node *html.Node) bool
ShouldRemove determines if a node should be removed from the content tree.
type EncodingDetector ¶ added in v1.2.0
type EncodingDetector struct {
// User-specified encoding override (optional)
ForcedEncoding string
// Smart detection options
EnableSmartDetection bool // Enable intelligent encoding detection
MaxSampleSize int // Max bytes to analyze for statistical detection (default: 10KB, max: 1MB)
}
EncodingDetector handles charset detection and conversion.
IMPORTANT: The data slice passed to detection methods must not be modified during the detection process. For concurrent access, pass a copy of the data.
func NewEncodingDetector ¶ added in v1.2.0
func NewEncodingDetector() *EncodingDetector
NewEncodingDetector creates a new encoding detector with smart detection enabled. The default MaxSampleSize is 10KB which is sufficient for most HTML documents.
func (*EncodingDetector) DetectAndConvert ¶ added in v1.2.0
func (ed *EncodingDetector) DetectAndConvert(data []byte) ([]byte, string, error)
DetectAndConvert detects charset and converts to UTF-8 in one step
func (*EncodingDetector) DetectCharset ¶ added in v1.2.0
func (ed *EncodingDetector) DetectCharset(data []byte) string
DetectCharset attempts to detect the character encoding from HTML content
func (*EncodingDetector) DetectCharsetBasic ¶ added in v1.2.0
func (ed *EncodingDetector) DetectCharsetBasic(data []byte) string
DetectCharsetBasic performs basic charset detection (BOM, meta tags, UTF-8 validation) Optimized with fast path for pure ASCII/UTF-8 content to avoid string allocation.
func (*EncodingDetector) DetectCharsetSmart ¶ added in v1.2.0
func (ed *EncodingDetector) DetectCharsetSmart(data []byte) EncodingMatch
DetectCharsetSmart performs intelligent charset detection using statistical analysis
func (*EncodingDetector) SetMaxSampleSize ¶ added in v1.3.0
func (ed *EncodingDetector) SetMaxSampleSize(size int) *EncodingDetector
SetMaxSampleSize sets the maximum sample size for statistical detection. Values <= 0 use the default (10KB). Values > 1MB are capped at 1MB to prevent memory exhaustion. This method returns the detector for method chaining.
type EncodingMatch ¶ added in v1.2.0
type EncodingMatch struct {
Charset string
Confidence int // 0-100
Score int // Detailed score
Valid bool // Whether decoding produced valid UTF-8
}
EncodingMatch represents a detected encoding with confidence score
type NoOpAuditRecorder ¶ added in v1.3.0
type NoOpAuditRecorder struct{}
NoOpAuditRecorder is an audit recorder that does nothing. Used when audit logging is disabled.
func (NoOpAuditRecorder) RecordBlockedAttr ¶ added in v1.3.0
func (NoOpAuditRecorder) RecordBlockedAttr(attr, value string)
RecordBlockedAttr does nothing.
func (NoOpAuditRecorder) RecordBlockedTag ¶ added in v1.3.0
func (NoOpAuditRecorder) RecordBlockedTag(tag string)
RecordBlockedTag does nothing.
func (NoOpAuditRecorder) RecordBlockedURL ¶ added in v1.3.0
func (NoOpAuditRecorder) RecordBlockedURL(url, reason string)
RecordBlockedURL does nothing.
type Scorer ¶ added in v1.3.0
type Scorer interface {
// Score calculates a relevance score for a content node.
// Higher scores indicate more likely main content.
Score(node *html.Node) int
// ShouldRemove determines if a node should be removed from the content tree.
ShouldRemove(node *html.Node) bool
}
Scorer defines the interface for content scoring algorithms. Implementations can provide custom scoring logic for content extraction.
type ScoringConfig ¶ added in v1.3.0
type ScoringConfig struct {
// PositiveStrongPatterns maps pattern strings to their strong positive scores.
PositiveStrongPatterns map[string]int
// PositiveMediumPatterns maps pattern strings to their medium positive scores.
PositiveMediumPatterns map[string]int
// NegativeStrongPatterns maps pattern strings to their strong negative scores.
NegativeStrongPatterns map[string]int
// NegativeMediumPatterns maps pattern strings to their medium negative scores.
NegativeMediumPatterns map[string]int
// NegativeWeakPatterns maps pattern strings to their weak negative scores.
NegativeWeakPatterns map[string]int
// RemovePatterns maps pattern strings to a boolean indicating removal.
RemovePatterns map[string]bool
// TagScores maps tag names to their base scores.
TagScores map[string]int
}
ScoringConfig holds the configuration for the default scorer.
func DefaultScoringConfig ¶ added in v1.3.0
func DefaultScoringConfig() *ScoringConfig
DefaultScoringConfig returns the default scoring configuration.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package table provides HTML table extraction and rendering functionality.
|
Package table provides HTML table extraction and rendering functionality. |
|
Package testutil provides common test utilities for the html package.
|
Package testutil provides common test utilities for the html package. |