Secure Handling of Secrets

Your flows are programs that run on the Hyperscience Platform and often need to interact with other systems - by running DB queries, making HTTP requests, etc. In any production setting, such external systems would need some form of authentication, therefore your flows need to have access to some secrets - passwords, tokens, etc.

Some of the important rules to follow when dealing with any secrets are:

Never paste them directly into your code.
Never commit them to your code repository.
Make sure they don’t end up in logs.
Make sure they don’t end up in files that can be sent over insecure channels.

Here is how to work with secrets in the context of flows in a secure manner that avoids these pitfalls.

Defining secrets

Normally, your flows will define some fields that users can configure or fill. For normal fields, the value that the user configures would be saved in the flow. If they then export that flow, the value would be there inside the exported flow JSON file. Additionally, at runtime, while the flow is being executed by the Hyperscience instance, these values are circulated and stored unencrypted in long-lived DB tables, logs, etc.

This behavior is not desired for sensitive values. Therefore, Hyperscience allows you to mark some fields with a "secret": true property in the field’s manifest. Such fields can be edited in the Flow Studio UI in the application, but their values are never sent to the browser or saved in the flow. Instead, those values are saved in the instance DB in a secure, encrypted manner separately from the flow definition. The values are system-specific and not exported or imported together with the flow.

In the Flows SDK, you can mark a field as a secret as follows:

Parameter(
    name="secret_flow_input",
    title='Secret Flow Input',
    type='string',
    secret=True,
    optional=True,
)

Once edited, the flow only keeps a reference to the secret, but not the value itself:

"input": {
  "secret_flow_input": "${system.secrets.68b2c988dce64bb1a66e00583a0a34f1__ea2808de2309493a8c46221bfe71993b__secret_flow_input}"
},

When you view the flow run at runtime, you will also see similar references to secrets rather than seeing the actual values:

If a flow has a required secret field, and that flow is imported into a system that has no value defined for that field, a validation error will occur in the flow, and the user will be prompted to provide a value for that field.

Secrets are kept in the instance DB and not defined inside the flow or exported together with the flow. Therefore, one and the same flow can be used across several environments (e.g., development, UAT, production), while the secrets it needs can be different across these environments, which is a commonly required scenario.

The default Document Processing flow in Hyperscience itself uses secrets for storing tokens, such as the ones necessary for accessing S3 and OCS to download submission data when creating the submission. All Input and Output Blocks in our library also use "secret": true for any field that contains potentially sensitive information - passwords, tokens, etc.

Referencing and using secrets

Blocks in the flow can refer to the values of secret fields as they do for normal fields (e.g., via the workflow_input("some_secret_field") construct). There is an example flow below that illustrates how values of secret fields are referenced.

The system makes sure that these secret values are fetched from the DB and decrypted only at the exact moment they are needed by a block in order to process a task (e.g., to make a DB query or an HTTP request), then the block “forgets” the value.

As a result, the secret value itself does not linger in memory, does not leak in logs, and does not stay in the flow run data. It cannot be accessed via user-accessible APIs and does not reach any user’s browser, even for the purposes of editing flows. It does not stay in the definition of the flow (neither in the JSON nor the Python file) or the code that’s backing that flow’s Custom Code Blocks. Therefore, there is no risk of the secret value being exported and sent over insecure channels. Only references to secrets are stored in all such contexts, not the secret values themselves.

There are also security restrictions in place so that one flow cannot use or access the secrets of another flow.

Sample flow that uses secrets

import sys

from flows_sdk.blocks import CodeBlock
from flows_sdk.flows import Flow, Manifest, Parameter
from flows_sdk.package_utils import export_flow
from flows_sdk.utils import str_to_deterministic_uuid_4, workflow_input

SECRETS_IN_CCB_FLOW_IDENTIFIER = 'SECRET_IN_CCB_FLOW'
SECRET_IN_CCB_FLOW_UUID = str_to_deterministic_uuid_4(SECRETS_IN_CCB_FLOW_IDENTIFIER)


def entry_point_flow() -> Flow:
    return secrets_in_ccb_flow()


class FlowInputs:
    SECRET_FLOW_INPUT = 'secret_flow_input'


def secrets_in_ccb_flow() -> Flow:
    def use_secret(secret: str) -> str:
        import requests  # type: ignore[import-untyped]

        headers = {'Authorization': 'Token ' + secret}
        requests.post(
            'http://localhost:8080/api/v5/flows/dd0ad61c-6e6b-4097-a4f5-db9511e002ab/deploy',
            headers=headers,
        )

        return "don't return secrets as they will leak in plain text"

    use_secret_ccb = CodeBlock(
        reference_name='use_secret',
        code=use_secret,
        code_input={'secret': workflow_input(FlowInputs.SECRET_FLOW_INPUT)},
    )

    return Flow(
        depedencies={},
        title='Flow that uses secrets in a CCB',
        description='Flow that uses secrets in a CCB',
        blocks=[use_secret_ccb],
        owner_email='flows.sdk@hyperscience.com',
        manifest=Manifest(
            identifier=SECRETS_IN_CCB_FLOW_IDENTIFIER,
            input=[
                (
                    Parameter(
                        name=FlowInputs.SECRET_FLOW_INPUT,
                        title='Secret string',
                        type='string',
                        secret=True,
                    )
                )
            ],
        ),
        uuid=SECRET_IN_CCB_FLOW_UUID,
    )


if __name__ == '__main__':
    export_filename = None
    if len(sys.argv) > 1:
        export_filename = sys.argv[1]

    export_flow(flow=entry_point_flow(), filename=export_filename)

Download flow_with_secret.py

Making secure requests to the Hyperscience instance

Secret fields give you one way to store tokens that are necessary to authenticate into external systems. If you need to make requests to the Hyperscience instance itself, you may consider using the HyperscienceRestApiBlock, which will take care of the authentication without the user having to provide tokens.

Security guidelines

The Hyperscience Platform will store the user secrets securely and will make them available to blocks in a secure manner, but it cannot control what the block will do with the secret. To ensure that the secrets do not leak, follow these guidelines for any code blocks or flows that you implement that use secrets:

Never log the secret value.
Do not return the secret value as part of the block’s output. If this happens, the secret value will leak into and be stored in the long-lived flow-run data, unencrypted.
Do not use the secret to make insecure requests to external systems.
Do not include the secret value when raising exceptions, which would cause it to be stored unencrypted in logs.

Secrets cannot be nested within strings or objects

Some templating constructs that work for normal fields are deliberately disallowed for secrets for security reasons.

The following constructs will not work as desired - it does not matter whether the flow is defined in Python, as illustrated below, or if the JSON equivalent is used.

You will not get an error when compiling or importing flows that attempt to use these constructs, but at runtime the references to the secrets will not be “resolved” into their actual values, instead you will be seeing references to secrets like $[hs.system_secrets.e769f12ba5b44d17a4f8a052b7c15c02__44487577ed5042e2adbd58dcccd313ed__secret], not the actual secret values.

# Using a reference to a secret in the middle of a string is not allowed.
# The example would compile and run, but will not use the correct secret value.
HttpRestBlock(
    reference_name='llm_chat_completion',
    title='LLM Chat Completion',
    method='POST',
    endpoint='https://api.openai.com/v1/chat/completions',
    authorization_type='http_header',
    auth_params=HttpHeaderAuthorizationParams(
        # secret in string below
        authorization_header=f'Bearer {workflow_input("bearer_token")}',  # not allowed
        authorization_header_name='Authorization',
    ),
    json={ ... },
)

# Nesting a reference to a secret inside a larger object is not allowed.
# The example would compile and run, but will not use the correct secret value.
PythonBlock(
    code=ccb_fn,
    reference_name='ccb',
    code_input={
        'headers': {
            'Authorization': workflow_input('deeply_nested_secret'),  # not allowed
            'Content-Type': 'text/html; charset=utf-8',
        }
    },
)

Additionally, as stated above, you should not return a secret inside the output of any block. Therefore, it is not an option to use Custom Code Blocks in order to build expressions like “Bearer <the_secret_token>” and pass them into a block (e.g., HttpRestBlock).

The secure alternative here is to define a secret value that contains the whole string or object which encapsulates the actual secret data. Then, the full secret should be passed directly into the block.

For the first example above, use the following approach:

# In the flow manifest, the secret field would look like this:
Parameter(
    name='authorization_header_with_secret_inside',
    title='The full authorization header',
    type='string',
    secret=True,
    optional=True,
)

# When editing this field as a secret, the secret value should be defined
# as "Bearer <the_secret_token>"

# The block would use the full value, not "combine" it from secret and non-secret parts
HttpRestBlock(
    reference_name='llm_chat_completion',
    title='LLM Chat Completion',
    method='POST',
    endpoint='https://api.openai.com/v1/chat/completions',
    auth_params=HttpHeaderAuthorizationParams(
        authorization_header=workflow_input("authorization_header_with_secret_inside"),
        authorization_header_name='Authorization',
    ),
    json={ ... },
)

For the second example, use the following approach:

# In the flow manifest, the secret field would look like this:
Parameter(
    name='headers_with_secrets_inside',
    title='Headers that have secrets inside',
    type='json',  # important so that we can store a full object, not just a string
    secret=True,
    optional=True,
)

"""
When editing this field as a secret, the secret value
should be defined as a json object:
{
    "Authorization": "contains secrets",
    "Content-Type": "text/html; charset=utf-8"
}
"""

# The block would use the full value, not "combine" it from secret and non-secret parts.
# For PythonBlocks, code_input is the equivalent of the "input" for other blocks,
# so having secrets at the top level of the "code_input" dictionary works OK.
PythonBlock(
    code=ccb_fn,
    reference_name='ccb',
    code_input={'headers': workflow_input('headers_with_secrets_inside')},
)

The example flow below demonstrates these limitations. In order to try this flow out, after generating the flow json and importing it, you need to edit the secrets inside. For the “Secret object” field, you should enter some valid json, e.g. {"password": "my_secret"}, while the “Secret string” field is an ordinary string. After you configure it and run it, you can inspect the Inputs/Outputs of the blocks in the “View Flow Run” UI in order to understand the limitations explained above.

Download secrets_limitations.py