Get your own personal Code Search
Understand the links between your repositories and third-party code
Published Jun 17, 2020 • Last updated Jun 20, 2020 • 3 min read
When it comes to finding a piece of code, GitHub’s search can be less than helpful at times. In stark contrast to that, the public version of Google’s Code Search tooling absolutely kicks ass. cs.android.com and cs.opensource.google are my go-to tools for finding open source Google code, but unfortunately, they’re just that: Google Code Search. If you’re looking for non-Google code, you’re out of luck. Until now.
GCP offers “Cloud Source Repositories,” which are basically just private GitHub repos with one major benefit: Google will build a unified index of your code across all the repos in all the GCP projects you have access to. That means you can log in with an account that has access to your personal and work GCP projects to get one view into every line of code that matters to you.
For this guide, I’m going to assume your code is on GitHub, but GCP also supports Bitbucket out of the box. If your code isn’t on one of those two platforms (think GitLab), you can always set up manual mirroring.
Let’s get going!
Fork third-party repos you care about #
To mirror code from GitHub to GCP, you must be the owner of the repository in question (or at least have admin privileges). This is because Source Repositories will add a webhook to your GitHub repo that copies all changes as they happen. Since you probably don’t have admin privileges on third-party repositories, you can instead fork them and mirror the fork.
One small hiccup: your fork will get quickly out-of-date without regular syncing. Thankfully, GitHub Actions makes this problem trivially easy to solve. Put this CRON job in a .github/workflows/fork-sync.yml
workflow file, and you’ll be all set:
name: Sync Fork
on:
schedule:
- cron: '0 0 * * *'
jobs:
sync:
runs-on: ubuntu-latest
steps:
- name: Sync
uses: TG908/fork-sync@v1
with:
# You can't use the built-in GITHUB_TOKEN because it doesn't have write access in forks:
# https://help.github.com/en/actions/configuring-and-managing-workflows/authenticating-with-the-github_token#permissions-for-the-github_token
# Your personal token will only need the public_repo scope.
# TODO(you): add your personal token in the repo's secrets page.
github_token: ${{ secrets.TOKEN }}
# TODO(you): change this to the upstream repository owner. In this case, I'm mirroring
# https://github.com/gradle/gradle to https://github.com/SUPERCILEX/gradle and telling
# the action that `gradle` is the upstream repo to pull from.
owner: gradle
Create a GCP project #
To keep things organized, create a new GCP project just for code mirroring. I’ve called mine alex-codesearch
.
Note: I believe Source Repositories will require setting up a billing account at some point, but the free quotas are extremely generous and I have yet to pay a cent.
Mirror your repos to GCP #
You can now finally connect your repositories.
Pro tip: if you’re going to mirror more than a few repos, I would recommend opening up Chrome Dev Tools before clicking
Connect selected repository
. After clicking connect, look for the patch request in the Network tab. You can thenCopy as fetch
that request, paste it into the Console, and then replace the GitHub repo link near the bottom of the fetch URL with whatever other repos you’re trying to connect. Or, hopefully GCP will add support for connecting multiple repos at once in their UI by the time you read this.
Search! #
You’re all set now. The index will start populating at source.cloud.google.com.
A note about shared access
In a corporate environment, you can create a Google Group in your G Suite domain and give that group email
Viewer
access to the Code Search GCP project. Anyone in the group will then be able to access Code Search.
As a fun little usage example, I wanted to migrate my GitHub Actions workflows to the latest releases, so I started searching for actions
:
Oops, too many results. Let’s try only YAML files:
Better, but still some extraneous results. Let’s try "actions/"
:
Ah-ha! Since I know upload-artifact
has a v2
version available, let’s find all the instances where I’m still using v1
of the action:
As you get better at Code Search, you’ll be able to skip straight to the last step, but this demo shows the power of having killer search capabilities.
Happy code searching!