app

package
v1.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2026 License: Apache-2.0 Imports: 19 Imported by: 0

Documentation

Overview

Package app provides the main entry and task scheduling for the crawler application.

Index

Constants

This section is empty.

Variables

View Source
var LogicApp = New()

LogicApp is the global singleton core interface instance.

Functions

This section is empty.

Types

type App

type App interface {
	SetLog(io.Writer) App                                         // Set global log output to terminal
	LogGoOn() App                                                 // Resume log output
	LogRest() App                                                 // Pause log output
	Init(mode int, port int, master string, w ...io.Writer) App   // Must call Init before using App (except SetLog)
	ReInit(mode int, port int, master string, w ...io.Writer) App // Switch run mode and reset log output target
	GetAppConf(k ...string) interface{}                           // Get global config
	SetAppConf(k string, v interface{}) App                       // Set global config (not called in client mode)
	SpiderPrepare(original []*spider.Spider) App                  // Must call after setting global params and before Run() (not called in client mode)
	Run()                                                         // Block until task completes (call after all config is done)
	Stop()                                                        // Terminate task mid-run in Offline mode (blocks until current task stops)
	IsRunning() bool                                              // Check if task is running
	IsPaused() bool                                               // Check if task is paused
	IsStopped() bool                                              // Check if task has stopped
	PauseRecover()                                                // Pause or resume task in Offline mode
	Status() int                                                  // Return current status
	GetSpiderLib() []*spider.Spider                               // Get all spider species
	GetSpiderByName(string) option.Option[*spider.Spider]         // Get spider by name
	GetSpiderQueue() crawler.SpiderQueue                          // Get spider queue interface
	GetOutputLib() []string                                       // Get all output methods
	GetTaskJar() *distribute.TaskJar                              // Return task jar
	distribute.Distributor                                        // Implements distributed interface
}

func New

func New() App

type Logic

type Logic struct {
	*cache.AppConf        // Global config
	*spider.SpiderSpecies // All spider species
	crawler.SpiderQueue   // Spider queue for current task
	*distribute.TaskJar   // Task storage passed between server and client
	crawler.CrawlerPool   // Crawler pool
	teleport.Teleport     // Socket duplex communication, JSON transport

	sync.RWMutex
	// contains filtered or unexported fields
}

func (*Logic) CountNodes

func (l *Logic) CountNodes() int

CountNodes returns connected node count in server/client mode.

func (*Logic) GetAppConf

func (l *Logic) GetAppConf(k ...string) interface{}

GetAppConf returns global config value(s).

func (*Logic) GetMode

func (l *Logic) GetMode() int

GetMode returns current run mode.

func (*Logic) GetOutputLib

func (l *Logic) GetOutputLib() []string

GetOutputLib returns all output methods.

func (*Logic) GetSpiderByName

func (l *Logic) GetSpiderByName(name string) option.Option[*spider.Spider]

GetSpiderByName returns a spider by name.

func (*Logic) GetSpiderLib

func (l *Logic) GetSpiderLib() []*spider.Spider

GetSpiderLib returns all spider species.

func (*Logic) GetSpiderQueue

func (l *Logic) GetSpiderQueue() crawler.SpiderQueue

GetSpiderQueue returns the spider queue interface.

func (*Logic) GetTaskJar

func (l *Logic) GetTaskJar() *distribute.TaskJar

GetTaskJar returns the task jar.

func (*Logic) Init

func (l *Logic) Init(mode int, port int, master string, w ...io.Writer) App

Init initializes the app; must be called before use (except SetLog).

func (*Logic) IsPaused added in v1.4.0

func (l *Logic) IsPaused() bool

IsPaused reports whether the task is paused.

func (*Logic) IsRunning

func (l *Logic) IsRunning() bool

IsRunning reports whether the task is running.

func (*Logic) IsStopped

func (l *Logic) IsStopped() bool

IsStopped reports whether the task has stopped.

func (*Logic) LogGoOn

func (l *Logic) LogGoOn() App

LogGoOn resumes log output.

func (*Logic) LogRest

func (l *Logic) LogRest() App

LogRest pauses log output.

func (*Logic) PauseRecover

func (l *Logic) PauseRecover()

PauseRecover pauses or resumes the task in Offline mode.

func (*Logic) ReInit

func (l *Logic) ReInit(mode int, port int, master string, w ...io.Writer) App

ReInit switches run mode; use when changing mode.

func (*Logic) Run

func (l *Logic) Run()

Run executes the task.

func (*Logic) SetAppConf

func (l *Logic) SetAppConf(k string, v interface{}) App

SetAppConf sets a global config value.

func (*Logic) SetLog

func (l *Logic) SetLog(w io.Writer) App

SetLog sets global log output to the given writer.

func (*Logic) SpiderPrepare

func (l *Logic) SpiderPrepare(original []*spider.Spider) App

SpiderPrepare must be called after setting global params and immediately before Run(). original is the raw spider species from spider package without prior assignment. Spiders with explicit Keyin are not reassigned. Not called in client mode.

func (*Logic) Status

func (l *Logic) Status() int

Status returns current run status.

func (*Logic) Stop

func (l *Logic) Stop()

Stop terminates the task mid-run in Offline mode.

Directories

Path Synopsis
aid
history
Package history provides persistence and inheritance of success and failure request records.
Package history provides persistence and inheritance of success and failure request records.
proxy
Package proxy provides proxy IP pool management and online filtering.
Package proxy provides proxy IP pool management and online filtering.
Package crawler provides the core crawler engine for request scheduling and page downloading.
Package crawler provides the core crawler engine for request scheduling and page downloading.
Package distribute provides distributed task scheduling and master-slave node communication.
Package distribute provides distributed task scheduling and master-slave node communication.
teleport
Package teleport provides a high-concurrency API framework for distributed systems.
Package teleport provides a high-concurrency API framework for distributed systems.
Package downloader defines the page downloader interface.
Package downloader defines the page downloader interface.
request
Package request provides encapsulation and deduplication of crawl requests.
Package request provides encapsulation and deduplication of crawl requests.
surfer
Package surfer provides a high-concurrency web downloader written in Go.
Package surfer provides a high-concurrency web downloader written in Go.
surfer/agent
Package agent generates user agents strings for well known browsers and for custom browsers.
Package agent generates user agents strings for well known browsers and for custom browsers.
surfer/example command
Package pipeline provides the data collection and output pipeline.
Package pipeline provides the data collection and output pipeline.
collector
Package collector implements result collection and output.
Package collector implements result collection and output.
collector/data
Package data provides storage structure definitions for data and file cells.
Package data provides storage structure definitions for data and file cells.
Package scheduler provides crawl task scheduling and resource allocation.
Package scheduler provides crawl task scheduling and resource allocation.
Package spider provides spider rule definition, species registration, and parsing.
Package spider provides spider rule definition, species registration, and parsing.
common
Package common provides HTML cleaning, form parsing, and other utility functions for spider rules.
Package common provides HTML cleaning, form parsing, and other utility functions for spider rules.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL