Using a managed object storage service (S3 or GCS)
Object storage is used by various Sourcegraph features, for example to store code graph indexes uploaded by users or the results of search jobs.
By default, Sourcegraph will use a sourcegraph/blobstore server bundled with the instance. This is intended as a temporary measure: we recommend that administrators configure self-hosted Sourcegraph to store this data in an AWS S3 or Google Cloud Storage (GCS) bucket following the guidance below. Doing so may decrease your hosting costs as persistent volumes are often more expensive than the same storage space in an object store service.
Starting in Sourcegraph 7.2, new instances only need to configure the Sourcegraph bucket, and Sourcegraph will use that single bucket for all features. Instances provisioned before 7.2 can continue to use their existing buckets, but the new bucket is still required.
Sourcegraph bucket
Self-hosted Sourcegraph instances using S3 or GCS object storage provisioned
before Sourcegraph 7.2
should provision an additional bucket following the guidance below.
Sourcegraph will report a warning when this bucket is not present, as it
will become required for new features in a future release. No action is
required if you are using the default sourcegraph/blobstore.
The Sourcegraph bucket is intended to be the single bucket for new Sourcegraph features. Instead of creating one bucket per feature, new features store objects under namespaced key prefixes within this bucket.
Existing bucket configuration for code graph indexes and search jobs remain in use. This change ensures future features can be enabled without requiring a new bucket for each feature.
New instances deployed using Sourcegraph 7.2 or later can choose to only provision the Sourcegraph bucket: this bucket can be used for both code graph indexes and search jobs, if not explicit configuration is provided for those features.
Using GCS for the Sourcegraph bucket
Set the following environment variables to target a GCS bucket for shared Sourcegraph uploads.
SOURCEGRAPH_UPLOAD_BACKEND=GCSSOURCEGRAPH_UPLOAD_BUCKET=<my bucket name>SOURCEGRAPH_UPLOAD_GCP_PROJECT_ID=<my project id>SOURCEGRAPH_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE=</path/to/file>(optional)SOURCEGRAPH_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE_CONTENT=<{"my": "content"}>(optional)
If you are running on GKE with Workload Identity, or otherwise relying on
Application Default Credentials, you can omit the GCS credentials file
variables. Grant the roles/storage.objectAdmin role to the service accounts used by
the frontend, worker, precise-code-intel-worker, gitserver, and
searcher services in a GKE environment.
Using S3 for the Sourcegraph bucket
Set the following environment variables to target an S3 bucket for shared Sourcegraph uploads.
SOURCEGRAPH_UPLOAD_BACKEND=S3SOURCEGRAPH_UPLOAD_BUCKET=<my bucket name>SOURCEGRAPH_UPLOAD_AWS_REGION=us-east-1SOURCEGRAPH_UPLOAD_AWS_ENDPOINT=https://s3.us-east-1.amazonaws.comSOURCEGRAPH_UPLOAD_AWS_ACCESS_KEY_ID=<your access key>SOURCEGRAPH_UPLOAD_AWS_SECRET_ACCESS_KEY=<your secret key>SOURCEGRAPH_UPLOAD_AWS_SESSION_TOKEN=<your session token>(optional)SOURCEGRAPH_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true(optional; set to use EC2 metadata API over static credentials)SOURCEGRAPH_UPLOAD_AWS_USE_PATH_STYLE=false(optional)
If a non-default region is supplied, ensure that the subdomain of the
endpoint URL (the AWS_ENDPOINT value) matches the target region.
You don't need to set the SOURCEGRAPH_UPLOAD_AWS_ACCESS_KEY_ID environment
variable when using SOURCEGRAPH_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true
because role credentials will be automatically resolved. Attach the IAM role
to the EC2 instances hosting the frontend, worker,
precise-code-intel-worker, gitserver, and searcher containers in a
multi-node environment.
Automatically provision the Sourcegraph bucket
Most deployments should provision this bucket directly in their cloud provider and leave this disabled. If you would like to allow your Sourcegraph instance to manage the target bucket configuration, set the following environment variable:
This requires additional bucket-management permissions from your configured storage vendor (AWS or GCP).
SOURCEGRAPH_UPLOAD_MANAGE_BUCKET=true
Code Graph Indexes
To target a managed object storage service for storing code graph index uploads, you will need to set a handful of environment variables for configuration and authentication to the target service.
Starting in Sourcegraph 7.2, new instances only need to configure the Sourcegraph bucket, and Sourcegraph will use that single bucket for all features. If a separate bucket is needed for Code Graph Indexes, that can still be configured, but we recommend using one bucket.
- If you are running a
sourcegraph/serverdeployment, set the environment variables on the server container - If you are running via Docker-compose or Kubernetes, set the environment variables on the
frontend,worker, andprecise-code-intel-workercontainers
Using S3 for the Code Graph Indexes bucket
To target an S3 bucket you've already provisioned, set the following environment variables. Authentication can be done through an access and secret key pair (and optional session token), or via the EC2 metadata API.
Never commit AWS access keys in Git. You should consider using a secret handling service offered by your cloud provider.
PRECISE_CODE_INTEL_UPLOAD_BACKEND=S3PRECISE_CODE_INTEL_UPLOAD_BUCKET=<my bucket name>PRECISE_CODE_INTEL_UPLOAD_AWS_REGION=us-east-1PRECISE_CODE_INTEL_UPLOAD_AWS_ENDPOINT=https://s3.us-east-1.amazonaws.comPRECISE_CODE_INTEL_UPLOAD_AWS_ACCESS_KEY_ID=<your access key>PRECISE_CODE_INTEL_UPLOAD_AWS_SECRET_ACCESS_KEY=<your secret key>PRECISE_CODE_INTEL_UPLOAD_AWS_SESSION_TOKEN=<your session token>(optional)PRECISE_CODE_INTEL_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true(optional; set to use EC2 metadata API over static credentials)
If a non-default region is supplied, ensure that the subdomain of the
endpoint URL (the AWS_ENDPOINT value) matches the target region.
You don't need to set the PRECISE_CODE_INTEL_UPLOAD_AWS_ACCESS_KEY_ID
environment variable when using
PRECISE_CODE_INTEL_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true because role
credentials will be automatically resolved. Attach the IAM role to the EC2
instances hosting the frontend, worker, and precise-code-intel-worker
containers in a multi-node environment.
Using GCS for the Code Graph Indexes bucket
To target a GCS bucket you've already provisioned, set the following environment variables.
PRECISE_CODE_INTEL_UPLOAD_BACKEND=GCSPRECISE_CODE_INTEL_UPLOAD_BUCKET=<my bucket name>PRECISE_CODE_INTEL_UPLOAD_GCP_PROJECT_ID=<my project id>PRECISE_CODE_INTEL_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE=</path/to/file>(optional)PRECISE_CODE_INTEL_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE_CONTENT=<{"my": "content"}>(optional)
If you are running on GKE with Workload Identity, or otherwise relying on
Application Default Credentials, you can omit the GCS credentials file
variables. Grant the roles/storage.objectAdmin role to the service accounts used by
the frontend, worker, and precise-code-intel-worker services in a GKE environment.
Automatically provision the Code Graph Indexes bucket
If you would like to allow your Sourcegraph instance to control the creation and lifecycle configuration management of the target buckets, set the following environment variables:
This requires additional bucket-management permissions from your configured storage vendor (AWS or GCP).
PRECISE_CODE_INTEL_UPLOAD_MANAGE_BUCKET=truePRECISE_CODE_INTEL_UPLOAD_TTL=168h(default)
Search Job results
To target a third party managed object storage service for storing Search Job results, you must set a handful of environment variables for configuration and authentication to the target service.
Starting in Sourcegraph 7.2, new instances only need to configure the Sourcegraph bucket, and Sourcegraph will use that single bucket for all features. If a separate bucket is needed for Search Job results, that can still be configured, but we recommend using one bucket.
- If you are running a
sourcegraph/serverdeployment, set the environment variables on the server container - If you are running via Docker-compose or Kubernetes, set the environment variables on the
frontendandworkercontainers
Using S3 for the Search Job results bucket
Set the following environment variables to target an S3 bucket you've already provisioned. Authentication can be done through an access and secret key pair (and optionally through session token) or via the EC2 metadata API.
Never commit AWS access keys in Git. You should consider using a secret handling service offered by your cloud provider.
SEARCH_JOBS_UPLOAD_BACKEND=S3SEARCH_JOBS_UPLOAD_BUCKET=<my bucket name>SEARCH_JOBS_UPLOAD_AWS_REGION=us-east-1SEARCH_JOBS_UPLOAD_AWS_ENDPOINT=https://s3.us-east-1.amazonaws.comSEARCH_JOBS_UPLOAD_AWS_ACCESS_KEY_ID=<your access key>SEARCH_JOBS_UPLOAD_AWS_SECRET_ACCESS_KEY=<your secret key>SEARCH_JOBS_UPLOAD_AWS_SESSION_TOKEN=<your session token>(optional)SEARCH_JOBS_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true(optional; set to use EC2 metadata API over static credentials)
If a non-default region is supplied, ensure that the subdomain of the
endpoint URL (the AWS_ENDPOINT value) matches the target region.
You don't need to set the SEARCH_JOBS_UPLOAD_AWS_ACCESS_KEY_ID environment
variable when using SEARCH_JOBS_UPLOAD_AWS_USE_EC2_ROLE_CREDENTIALS=true
because role credentials will be automatically resolved. Attach the IAM role
to the EC2 instances hosting the frontend and worker containers in a
multi-node environment.
Using GCS for the Search Job results bucket
Set the following environment variables to target a GCS bucket you've already provisioned.
SEARCH_JOBS_UPLOAD_BACKEND=GCSSEARCH_JOBS_UPLOAD_BUCKET=<my bucket name>SEARCH_JOBS_UPLOAD_GCP_PROJECT_ID=<my project id>SEARCH_JOBS_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE=</path/to/file>(optional)SEARCH_JOBS_UPLOAD_GOOGLE_APPLICATION_CREDENTIALS_FILE_CONTENT=<{"my": "content"}>(optional)
If you are running on GKE with Workload Identity, or otherwise relying on
Application Default Credentials, you can omit the GCS credentials file
variables. Grant the roles/storage.objectAdmin role to the service accounts used by
the frontend and worker services in a GKE environment.
Automatically provision the Search Job results bucket
If you would like to allow your Sourcegraph instance to control the creation and lifecycle configuration management of the target buckets, set the following environment variables:
This requires additional bucket-management permissions from your configured storage vendor (AWS or GCP).
SEARCH_JOBS_UPLOAD_MANAGE_BUCKET=true