=== HrefHawk ===
Contributors: woracious
Tags: internal links, seo, interlinking, content links, link suggestions
Requires at least: 6.0
Tested up to: 7.0
Stable tag: 0.1.0
Requires PHP: 7.4
License: GPLv2 or later
License URI: https://www.gnu.org/licenses/gpl-2.0.html

Automatic internal link discovery for WordPress. HrefHawk scans your content, finds linking opportunities between posts, and lets you accept or reject each suggestion from the editor sidebar.

== Description ==

HrefHawk is an internal linking plugin that reads your content, extracts meaningful phrases, scores them for relevance, and suggests where one post should link to another. No external API calls. No third-party dependencies. Everything runs inside your WordPress installation.

The Lexical engine scans every published post through a seven-stage cleaning pipeline that strips Gutenberg markup, shortcodes, HTML tags, entities, unicode artefacts, and punctuation. The cleaned text is split into sentences, then into phrases at configurable depth (1-word, 2-word, 3-word). Each phrase is indexed with an MD5 hash and mapped back to its source post with frequency, position, heading presence, title presence, and first-paragraph presence recorded as scoring signals.

After scanning, the scoring engine evaluates every phrase-to-post relationship using five weighted scorers: TF-IDF (term uniqueness across your corpus), phrase length (longer phrases score higher), structural context (heading and first-paragraph bonuses), title matching (exact phrase appears in the target post title), and category matching (source and target share a taxonomy term). Recency scoring biases suggestions toward fresher content. Each scorer contributes a weighted component to the final score.

Suggestions appear in the post editor sidebar under the HrefHawk panel. Each suggestion shows the source phrase, the target post, and the computed score. You accept or reject each suggestion individually. Accepted suggestions are committed as live links in your post content. Rejected suggestions are hidden permanently. You can revoke an accepted link at any time, which removes the anchor tag from the content and returns the suggestion to pending state.

The scan runs on a burst-aware pipeline that calibrates itself to your server's timeout limits. A loopback test measures available execution time, then the Dispatcher fires sequential bursts that process posts in batches, saving resume position on timeout. Long scans across thousands of posts complete reliably without hitting PHP timeout limits or memory ceilings.

Auto-rescan fires 60 seconds after each post save, rescanning only the saved post. This keeps the phrase index current without requiring a full site-wide scan after every edit. Auto-rescan can be paused during bulk edits from the Settings page.

Daily maintenance runs automatically: orphan cleanup removes database rows referencing deleted posts, and weekly table optimisation reclaims fragmented space and rebuilds indexes.

All diagnostic output routes through a dedicated Logger that writes daily-rotated log files with .htaccess protection. Debug logging is off by default and toggled from the Settings page.

== Installation ==

1. Upload the `href-hawk` folder to `wp-content/plugins/`.
2. Activate the plugin through the Plugins menu in WordPress.
3. Navigate to HrefHawk in the admin sidebar.
4. Import the default English stop words list from the Settings page, Stop Words tab.
5. Run your first scan from the Lexical Scan page.

== Frequently Asked Questions ==

= Does HrefHawk send data to external services? =

No. The Lexical engine runs entirely inside your WordPress installation. No content is transmitted to any external server. The only network call is a loopback request to your own site during scan calibration.

= How long does a scan take? =

Scan time depends on the number of published posts and average content length. The burst pipeline calibrates to your server and processes posts in timed batches. A site with 500 posts typically completes in under 2 minutes. Sites with 5,000+ posts may take 10-15 minutes. The scan runs in the background and can be monitored from the Scan page.

= Can I undo an accepted link? =

Yes. Every accepted link can be revoked from the editor sidebar. Revoking removes the anchor tag from the post content and returns the suggestion to pending state.

= Does the plugin modify my post content? =

Only when you explicitly accept a suggestion. Accepting inserts an anchor tag around the matched phrase in your post content. Rejecting or ignoring a suggestion makes no changes. Revoking an accepted link removes only the anchor tag that HrefHawk inserted.

= What is the stop words list? =

Stop words are common words (the, and, is, of, etc.) that are trimmed from the leading and trailing edges of extracted phrases. Interior stop words are preserved. "The WordPress plugin" becomes "WordPress plugin" but "state of the art" stays intact. You can import the default English list or manage your own entries from the Settings page.

== Changelog ==

= 0.1.0 =
* Initial release.
* Lexical scanning engine with seven-stage cleaning pipeline.
* Phrase extraction at configurable depth (1 to 5 words).
* Five weighted scorers: TF-IDF, phrase length, structural context, title matching, category matching.
* Recency scoring for freshness bias.
* Editor sidebar panel with accept, reject, and revoke workflow.
* Burst-aware pipeline with server timeout calibration.
* Auto-rescan on post save with pause toggle.
* Daily orphan cleanup and weekly table optimisation.
* Debug logging with daily rotation and .htaccess protection.
* English stop words list included.

== Upgrade Notices ==

= 0.1.0 =
Initial release.
