Understanding the architecture of Selenium WebDriver helps us gain a better overview of how this framework operates. This is also one of the automation engineer interview questions I encountered but, unfortunately, I didn't have this knowledge at the time. The diagram above shows that WebDriver operates based on a Client-Server model, with an intermediary component called the JSON Wire Protocol, where REST APIs play a critical role. Detailed explanations can be found below.
Functions of the Selenium Client Library (SCL)
The SCL is on the client side, supporting various programming languages such as Java, Python, .NET, Ruby, JavaScript, etc. For instance, to use the SCL for Java, you can download the libraries in the form of jar files here: https://www.selenium.dev/downloads/, then include these files in your project, or more simply, use Maven https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java/4.27.0 to automatically download and manage these libraries. The commands you often use to interact with elements, such as: driver.get("https://www.giaphi.com"), driver.findElement(), .click(), etc., come from these libraries.
The Role of JSON Wire Protocol (JWP)
The JWP acts as an intermediary between the SCL (client) and the driver + browser (server). It is responsible for sending requests from the SCL to the browser driver's HTTP server via REST APIs. Requests from the client are serialized (converted from object data to JSON format) before being sent to the server, and vice versa, when the server returns responses in JSON format, the JWP deserializes these results before passing them back to the client.
For example, when you write the command driver.get("https://www.giaphi.com/") to navigate to the page "https://www.giaphi.com/".
1) This command is sent to the JWP to serialize the data into a JSON payload, specifically the URL of the website to navigate to.
2) The JWP then forwards the command to the ChromeDriver via a REST API with the URL https://localhost:{{DEBUGGING_PORT}}/session/:sessionId/url using the POST method. How to get the DEBUGGING_PORT and sessionId will be presented later in the article. The cURL command for this would look like the following:
curl --location 'http://localhost:18633/session/c826aa2828884e046f5de30cf624694a/url' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://www.giaphi.com"
}'
3) Finally, the request is sent by the ChromeDriver to the Chrome browser, executing the navigation to the new URL. The result of the navigation is also returned in the reverse direction: from the browser to the driver, from the driver to the JWP, and from the JWP to the client.
👉The above API information is sourced from here: https://www.selenium.dev/documentation/legacy/json_wire_protocol/#sessionsessionidurl.
👉Every Selenium command has a corresponding REST API, and you can look up the details of how to use each API through the following link: https://www.selenium.dev/documentation/legacy/json_wire_protocol/
What is WebDriver?
WebDriver driver = new ChromeDriver();
In the above example, "driver" is the object (instance), and ChromeDriver is a class that implements WebDriver. More precisely, ChromeDriver inherits from ChromiumDriver, ChromiumDriver inherits from RemoteWebDriver, and RemoteWebDriver implements WebDriver. You can see more here: https://www.selenium.dev/selenium/docs/api/java/org/openqa/selenium/chrome/ChromeDriver.html
Since in this example we are working with the Chrome browser, we use ChromeDriver. Additionally, we also have FirefoxDriver, InternetExplorerDriver, and SafariDriver to work with the respective browsers like Firefox, Internet Explorer, and Safari.
Empirical Verification
At this point, we understand how a Selenium command operates. However, how can we prove that everything I've explained is correct? I created the video below to give you a clearer picture. Basically, my approach is as follows:
- During the browser launch step, I add parameters to log all requests/responses into the chromedriver.log file.
- Run any test case in debug mode; we will obtain the Debugging Port information, then call the API /sessions to get the sessionId.
- With these two pieces of information, we will call the API /session/:sessionId/url to navigate to any website.
I hope this article provides you with useful information about how Selenium operates. If you have any questions or suggestions, please leave a comment below. Have a nice day.