Profile Picture

Alex Saveau

Alex Saveau

Relentless efficiency

Blog

Get your own personal Code Search

Understand the links between your repositories and third-party code

Published Jun 17, 2020 • Last updated Jun 20, 2020 • 3 min read

Example search for a Gradle API

Comparing a search for examples on how to use Gradle's new Configuration Cache APIs between GitHub and my codesearch instance.


When it comes to finding a piece of code, GitHub’s search can be less than helpful at times. In stark contrast to that, the public version of Google’s Code Search tooling absolutely kicks ass. cs.android.com and cs.opensource.google are my go-to tools for finding open source Google code, but unfortunately, they’re just that: Google Code Search. If you’re looking for non-Google code, you’re out of luck. Until now.

GCP offers “Cloud Source Repositories,” which are basically just private GitHub repos with one major benefit: Google will build a unified index of your code across all the repos in all the GCP projects you have access to. That means you can log in with an account that has access to your personal and work GCP projects to get one view into every line of code that matters to you.

For this guide, I’m going to assume your code is on GitHub, but GCP also supports Bitbucket out of the box. If your code isn’t on one of those two platforms (think GitLab), you can always set up manual mirroring.

Let’s get going!

Fork third-party repos you care about #

To mirror code from GitHub to GCP, you must be the owner of the repository in question (or at least have admin privileges). This is because Source Repositories will add a webhook to your GitHub repo that copies all changes as they happen. Since you probably don’t have admin privileges on third-party repositories, you can instead fork them and mirror the fork.

One small hiccup: your fork will get quickly out-of-date without regular syncing. Thankfully, GitHub Actions makes this problem trivially easy to solve. Put this CRON job in a .github/workflows/fork-sync.yml workflow file, and you’ll be all set:

name: Sync Fork

on:
  schedule:
    - cron: '0 0 * * *'

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - name: Sync
        uses: TG908/fork-sync@v1
        with:
          # You can't use the built-in GITHUB_TOKEN because it doesn't have write access in forks:
          # https://help.github.com/en/actions/configuring-and-managing-workflows/authenticating-with-the-github_token#permissions-for-the-github_token
          # Your personal token will only need the public_repo scope.
          # TODO(you): add your personal token in the repo's secrets page.
          github_token: ${{ secrets.TOKEN }}
          # TODO(you): change this to the upstream repository owner. In this case, I'm mirroring
          # https://github.com/gradle/gradle to https://github.com/SUPERCILEX/gradle and telling
          # the action that `gradle` is the upstream repo to pull from.
          owner: gradle

Create a GCP project #

To keep things organized, create a new GCP project just for code mirroring. I’ve called mine alex-codesearch.

Note: I believe Source Repositories will require setting up a billing account at some point, but the free quotas are extremely generous and I have yet to pay a cent.

Mirror your repos to GCP #

You can now finally connect your repositories.

Pro tip: if you’re going to mirror more than a few repos, I would recommend opening up Chrome Dev Tools before clicking Connect selected repository. After clicking connect, look for the patch request in the Network tab. You can then Copy as fetch that request, paste it into the Console, and then replace the GitHub repo link near the bottom of the fetch URL with whatever other repos you’re trying to connect. Or, hopefully GCP will add support for connecting multiple repos at once in their UI by the time you read this.

You’re all set now. The index will start populating at source.cloud.google.com.

A note about shared access

In a corporate environment, you can create a Google Group in your G Suite domain and give that group email Viewer access to the Code Search GCP project. Anyone in the group will then be able to access Code Search.

As a fun little usage example, I wanted to migrate my GitHub Actions workflows to the latest releases, so I started searching for actions:

Sample search for actions

Oops, too many results. Let’s try only YAML files:

Sample search for actions in YAML files

Better, but still some extraneous results. Let’s try "actions/":

Sample search for actions/ in YAML files

Ah-ha! Since I know upload-artifact has a v2 version available, let’s find all the instances where I’m still using v1 of the action:

Sample search for actions/upload-artifact@v1 in YAML files

As you get better at Code Search, you’ll be able to skip straight to the last step, but this demo shows the power of having killer search capabilities.


Happy code searching!