- The paper presents a novel steganography technique that embeds secret data within HTML web pages by exploiting the case-insensitivity of HTML tags, ensuring imperceptibility.
- The method encodes data bits by altering the case of characters in HTML tags (uppercase for '1', lowercase for '0') and extracts the data by reading the character casing in the source code.
- This technique provides a non-destructive approach to data hiding in web pages, avoiding visual distortions common in image steganography and offering versatility for other case-insensitive languages.
The paper "Embedding Secret Data in Html Web Page" presents a novel approach for steganographically embedding secret data within HTML web pages. This method leverages the case-insensitivity of HTML tags, where different casing in tag notation (<head> vs. <HEAD>) results in no functional change from the browser's perspective. The uniqueness of this approach lies in utilizing the redundancy created by this case insensitivity to hide information without any visible alteration when the webpage is rendered by a browser.
Key Concepts and Methodology
- Redundancy and Imperceptibility: The authors exploit the redundancy of HTML's case insensitivity to ensure that changing the letter casing within tags does not alter the page’s appearance in the browser, thereby maintaining both conditions critical for effective data hiding: redundancy and imperceptibility.
- Data Embedding Process:
- Selection of Characters: The method targets alphabets within HTML tags as candidates for embedding secret data.
- Encoding Process: To embed a secret bit, the case of a character is altered—upper case represents a binary '1', while lower case represents '0'. This manipulation is imperceptible because HTML interpreters disregard case sensitivity in tags.
- Encoding Functions:
- Two key functions, l(c) and u(c), are defined for case manipulation:
- l(c) converts an uppercase character to lowercase.
- u(c) converts a lowercase character to uppercase.
- Algorithm Specifics:
- The embedding algorithm scans the HTML for tags and selects characters to apply the transformations dictated by the secret bit sequence.
- The algorithm involves a predefined function fstego, which determines if the case transformation should be applied based on the secret data bits.
- The length of the secret data can be encoded within a header tag, optionally encrypted for added security.
Extraction Process
The extraction technique is designed to be straightforward, mimicking the embedding process but in reverse. The viewer must examine the source code, where the secret bits can be extracted by analyzing the case of alphabets within the HTML tags:
- Secret Data Retrieval: By parsing the HTML tags and examining the character casing, the embedded bits can be reconstructed. If a character is uppercase, it counts as a '1', while a lowercase character is a '0'.
- Automation and Algorithm Use:
- A DFA (Deterministic Finite Automaton) is used to streamline the extraction process.
- The extraction algorithm follows this retrieval flow and can reconstruct the entire secret message based on the initial description in the embedded header.
Advantages and Implications
This method offers a strategy for embedding covert messages in web pages without altering visual content, setting it apart from image-based steganography that might introduce visual artifacts. The presented technique is of particular significance because:
- Non-Destructive to Host Data: Unlike in image steganography, where perceptible distortion is a potential issue, this HTML-based approach maintains the page's original appearance, thus avoiding suspicion.
- Versatility: Although demonstrated with HTML, this technique is applicable to other case-insensitive programming languages (e.g., BASIC, parts of Pascal) or within case-insensitive sections (e.g., comments) of case-sensitive languages (e.g., C).
The paper concludes by addressing potential extensions of the technique to other programming languages and discussing its unique vantage in ensuring that data embedding remains confidential without visible distortion, emphasizing the significance of redundancy and imperceptibility as guiding principles.