Capture screenshot and description of a web page using Go and Chromedp

Apr 5, 2023 CHROMEDP GO WEB

Adding or sharing a link with a snapshot of a web page is a very useful feature in Microsoft Teams or in Zoom chat. This gives a user a way to quickly look at the web page snapshot and a small description about the link.

Go has an amazing library called chromedp which allows you to perform some amazing tasks like taking screenshot, fill out a form and submit it, send key events, device emulation, script evaluation etc. You can look at the complete example list here.

We are interested in taking a screenshot of the web page for the given URL. Chromedp allows you the take a screenshot of a specific element of the page or for the entire browser viewport.

The use case I am having is to capture just enough of the web page to give an idea to the user about the website. Looking at the screenshot and recalling whether one has visited this link or not is easier than looking at the link. Here is a sample code which will let you snap a screenshot of a given web page.

package main

import (
 "context"
 "crypto/rand"
 "fmt"
 "log"
 "os"
 "time"
 "unsafe"

 "github.com/chromedp/cdproto/emulation"
 "github.com/chromedp/chromedp"
)

const (
 UserAgentName = "Websnapv1.0"
 Path          = "images\\"
 Timeout       = 15
)

// https://stackoverflow.com/questions/22892120/how-to-generate-a-random-string-of-a-fixed-length-in-go
func GenRandStr(n int) string {
 var alphabet = []byte("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
 b := make([]byte, n)
 rand.Read(b)
 for i := 0; i < n; i++ {
  b[i] = alphabet[b[i]%byte(len(alphabet))]
 }
 return *(*string)(unsafe.Pointer(&b))
}

func Snap(url string) (string, error) {
 ctx, cancel := chromedp.NewContext(context.Background())
 defer cancel()

 // Creating timeout for 15 seconds
 ctx, cancel = context.WithTimeout(ctx, time.Second*Timeout)
 defer cancel()

 name := GenRandStr(12)
 var buf []byte
 var file = Path + name + ".png"

 err := chromedp.Run(ctx,
  emulation.SetUserAgentOverride(UserAgentName),
  chromedp.Navigate(url),
  //chromedp.FullScreenshot(&buf, 100),
  chromedp.CaptureScreenshot(&buf),
 )

 if err != nil {
  log.Fatalf("ERROR:webext - Unable to extract meta(s) from the given URL - %s\n", url)
  return "", err
 }

 if err := os.WriteFile(file, buf, 0o644); err != nil {
  log.Fatalf("ERROR:webext - Cannot create snap for the link - %s\n", url)
  file = ""
  return "", err
 }

 return file, nil
}

func main() {
 i, err := Snap("https://medium.com/@prashantkhandelwal/marvel-comics-universe-graph-database-7264381478e1")
 if err != nil {
  log.Fatalf("ERROR: Unable to retrieve screenshot - %v", err.Error())
 }

 // print the image path
 fmt.Println(i)
}

Here, I have some const declared which are easy to understand. Then I have a random string generator which I found on Stackoverflow to generate random image names. In the end I have a function snap which takes a screenshot of the web page. Overall, this function is very simple to understand but I want you to pay special attention to this part of the function where chromedp is used.

err := chromedp.Run(ctx, 
  emulation.SetUserAgentOverride("Bindv1.0"), 
  chromedp.Navigate(url), 
  chromedp.CaptureScreenshot(&buf), 
 )

Now the CaptureScreenshot is a function which will capture a part of the web page. This function accepts the pointer to the buffer to which it writes the output eventually writes to an image file. Here is an example output:

Screenshot from `CaptureScreenshot` function

The next useful function that you can also use is called FullScreenshot. This function will let you capture the entire web page as an image. You can use this function with your functional tests to check how your web page looks when accessed from a different location or with different parameters. This function takes two parameters, the first is the pointer to the buffer, just like the one before and the second one is the quality of the image. As the screenshot is for the entire viewport or the web page, I am uploading an image displayed in an image viewer program to give you a perspective of the screenshot. Here is an example output:

Screenshot from `FullScreenshot` function

With screenshots done, let’s also get some basic information about the web page as well. Let’s create a new method and name it ExtractMeta. It accepts URL of the web page as a parameter and returns a pointer to the WebData struct which holds value for Title and Description of the web page. This function looks exactly like the Snap function except for a slight change in the Run function usage and some variable declarations to hold returned values. Here is the code for extracting the metadata information:

var pageTitle, description string

var w = &WebData{}

err := chromedp.Run(ctx,
 emulation.SetUserAgentOverride(UserAgentName),
 chromedp.Navigate(url),
 chromedp.Title(&pageTitle),
 chromedp.Evaluate(`document.querySelector("meta[name^='description' i]").getAttribute('content');`, &description),
)

w.Title = pageTitle
w.Description = description

Notice that the Run function has additional parameters chromedp.Evaluate and chromedp.Title. The chromedp.Title returns the title of the web page. The chromedp.Evaluate function lets you evaluate or execute a JavaScript on the web page it is visiting and return the result so you can use it. For our use case, which is to get the description of the web page, we can execute the document.querySelector on the meta tags of the web page where the meta tag name equals to description. The i is the case-insensitivity qualifier here. Add the below code to the main function:

w, err := ExtractMeta("https://medium.com/@prashantkhandelwal/marvel-comics-universe-graph-database-7264381478e1")
if err != nil {
 log.Fatalf("ERROR: Unable to retrieve metadata - %v", err.Error())
}
fmt.Println(w.Title)
fmt.Println(w.Description)

Executing above code will generate the output like this:

Final output of the entire program

Similarly, you can also execute this function multiple times to get other information from the web page as desired.

Here is the complete code for reference:

package main

import (
    "context"
    "crypto/rand"
    "fmt"
    "log"
    "os"
    "time"
    "unsafe"

    "github.com/chromedp/cdproto/emulation"
    "github.com/chromedp/chromedp"
)

const (
    UserAgentName = "Websnapv1.0"
    Path          = "images\\"
    Timeout       = 15
)

type WebData struct {
    Title       string
    Description string
}

// https://stackoverflow.com/questions/22892120/how-to-generate-a-random-string-of-a-fixed-length-in-go
func GenRandStr(n int) string {
    var alphabet = []byte("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
    b := make([]byte, n)
    rand.Read(b)
    for i := 0; i < n; i++ {
        b[i] = alphabet[b[i]%byte(len(alphabet))]
    }
    return *(*string)(unsafe.Pointer(&b))
}

func Snap(url string) (string, error) {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    // Creating timeout for 15 seconds
    ctx, cancel = context.WithTimeout(ctx, time.Second*Timeout)
    defer cancel()

    name := GenRandStr(12)
    var buf []byte
    var file = Path + name + ".png"

    err := chromedp.Run(ctx,
        emulation.SetUserAgentOverride(UserAgentName),
        chromedp.Navigate(url),
        //chromedp.FullScreenshot(&buf, 100),
        chromedp.CaptureScreenshot(&buf),
    )

    if err != nil {
        log.Fatalf("ERROR:webext - Unable to extract meta(s) from the given URL - %s\n", url)
        return "", err
    }

    if err := os.WriteFile(file, buf, 0o644); err != nil {
        log.Fatalf("ERROR:webext - Cannot create snap for the link - %s\n", url)
        file = ""
        return "", err
    }

    return file, nil
}

func ExtractMeta(url string) (*WebData, error) {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    // Creating timeout for 15 seconds
    ctx, cancel = context.WithTimeout(ctx, time.Second*Timeout)
    defer cancel()

    var pageTitle, description string

    var w = &WebData{}

    err := chromedp.Run(ctx,
        emulation.SetUserAgentOverride(UserAgentName),
        chromedp.Navigate(url),
        chromedp.Title(&pageTitle),
        chromedp.Evaluate(`document.querySelector("meta[name^='description' i]").getAttribute('content');`, &description),
    )

    if err != nil {
        log.Fatalf("ERROR:webext - Unable to extract meta(s) from the given URL - %s\n", url)
        return nil, err
    }

    w.Title = pageTitle
    w.Description = description

    return w, nil
}

func main() {
    i, err := Snap("https://medium.com/@prashantkhandelwal/marvel-comics-universe-graph-database-7264381478e1")
    if err != nil {
        log.Fatalf("ERROR: Unable to retrieve screenshot - %v", err.Error())
    }

    // print the image path
    fmt.Println(i)

    // Extract metadata of the page
    w, err := ExtractMeta("https://medium.com/@prashantkhandelwal/marvel-comics-universe-graph-database-7264381478e1")
    if err != nil {
        log.Fatalf("ERROR: Unable to retrieve metadata - %v", err.Error())
    }
    fmt.Println(w.Title)
    fmt.Println(w.Description)
}

References