Speed Up Your Job Hunt With This LinkedIn Scraper and Your System Notifications
Speed Up Your Job Hunt With This LinkedIn Scraper and Your System Notifications

Speed Up Your Job Hunt With This LinkedIn Scraper and Your System Notifications

 
In the fast-paced world of job hunting, staying ahead of the competition means being the first to know about new opportunities. With the power of automation, you can have a system that scrapes LinkedIn for job postings and notifies you in real-time. This blog post dives into a JavaScript script using Puppeteer, Node-Notifier, and Node-Cron to make your job search more efficient.

Setting the Stage: The Components

The script integrates several key components:
  1. Puppeteer: A Node library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. It allows us to navigate the web programmatically.
  1. Node-Notifier: A Node.js module for sending notifications on native systems (Windows, Mac, Linux).
  1. Node-Cron: A task scheduler in pure JavaScript for running scheduled jobs.
  1. CSV File: Stores the job search configurations.
  1. Cache File: Maintains a log of previously seen job postings to avoid duplicate notifications.

Step-by-Step Breakdown

1. Reading Configurations

The script begins by reading configurations from a CSV file, searches.csv, which contains search terms, intervals, and LinkedIn URLs:
javascriptCopy code
const configCSV = fs.readFileSync('./searches.csv', {
  encoding: 'utf-8',
})

const configArr = configCSV.split('\n').map((line) => line.split(','));

configArr.shift()

const input = configArr.filter((x) => x.length === 3).map(([title, minuteLapse, url]) => ({
  url,
  title,
  minuteLapse: Number.parseInt(minuteLapse)
}))
This code reads the CSV file, splits it into lines, and maps each line to an object containing the job title, time interval, and LinkedIn search URL.

2. Caching Previous Job Postings

To avoid notifying about the same job postings multiple times, the script reads a cache file that stores previously seen URLs:
javascriptCopy code
const _cachedUrls = fs.readFileSync('./cache', {
  encoding: 'utf-8',
})

const cachedUrls = ((_cachedUrls.includes('\n')) ? _cachedUrls.split('\n') : []).filter((x) => x.length !== 0)

3. Scraping LinkedIn

The main function grabLinks uses Puppeteer to navigate to LinkedIn search pages, scrape job posting links, and filter out those already seen:
javascriptCopy code
const grabLinks = async ({title, url, minuteLapse}) => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    const properUrl = url.replace(/f_TPR\=r\d+/, minuteLapse <= 60 ? `f_TPR=r3600` : `f_TPR=r3600`).replace(/\n/g, '').trim()

    await page.goto(properUrl);

    const topLinks = await page.evaluate(() => {
      const results = [];
      document.querySelectorAll('#main-content > section.two-pane-serp-page__results-list > ul > li > div > a').forEach((node) => {
        results.push([node.innerText, node.getAttribute('href')]);
      })
      return results;
    });
    const links = topLinks.filter(([_title, link]) => {
      if(cachedUrls.length === 0) return true;
      return !cachedUrls.find((x) => encodeURI(link.split('?')[0]).includes(encodeURI(x)))
    })
    if(links.length > 0) {
      notifier.notify({
        title,
        message: `Found ${links.length} Job Postings`,
        sound: 'Basso',
        actions: 'Open in Chrome',
        timeout: 120,
        contentImage: `${process.cwd()}/LinkedIn.svg`,
      }, async function (_error, response) {
        if(response === 'activate') {
          const command = links.slice(0, limit).map(([_, _url]) => `open -a "Google Chrome" "${_url}"`).join(' && sleep 0.1 && ')
          await exec(command)
          fs.writeFileSync('./cache', _cachedUrls + links.map(([_, _url]) => `${_url.split('?')[0]}\n`).join(''), {
            encoding: 'utf-8'
          });
          if(links.length >= 10)
            grabLinks({title, url, minuteLapse})
        }
      });
    }
}
This function:
  1. Launches a headless browser and navigates to the LinkedIn search URL.
  1. Scrapes job links from the page.
  1. Filters out links already seen and stored in the cache.
  1. Sends a system notification if new job postings are found.
  1. Opens the links in Chrome if the user interacts with the notification.
  1. Updates the cache with new job links.

4. Scheduling the Job Scraper

To continuously check for new job postings, the script uses Node-Cron to schedule the grabLinks function at specified intervals:
javascriptCopy code
(function run() {
  input.forEach(({minuteLapse, title, url}, i) => {
    grabLinks({title, url, minuteLapse})
    cron.schedule(`*/${minuteLapse} * * * *`, () => {
      grabLinks({title, url, minuteLapse})
    });
  })
})()
This self-executing function loops through each job search configuration and sets up the scraper to run at the defined intervals.

Conclusion

This script automates the process of finding and notifying about new job opportunities on LinkedIn. By leveraging Puppeteer for web scraping, Node-Notifier for system notifications, and Node-Cron for scheduling tasks, it ensures you stay on top of new job postings with minimal effort. Whether you're an active job seeker or just keeping an eye out for the next big opportunity, this automated solution can give you the edge in your job search.