Today's ask Twitter for prior art:
What's a performant + reasonable approach to determine if, for a given http URL, is the "same" HTML content available at the https protocol?
7
1
2
Assume that the page will have stuff like absolute URLs with different schemes and CSRF tokens in the body which should be treated as the "same" content for this use case.
2
2
If they use a common template, might be same HTML structure with different content. Can’t you diff the two raw HTML and allow a % variance? That allows for stock quotes that update, or time stamps, but is essentially same content.
Aug 6, 2018 · 5:15 PM UTC
1

