Database Debugging Standard Operating Procedure

1. Introduction
    This Standard Operating Procedure outlines systematic processes for analyzing database query failures, identifying the root cause, and recommending appropriate fixes when working with database connections in SageMaker Studio.

2. Input specification 
    You will receive a JSON file with the following fields: 

    {
        "path_to_notebook": "<string>",
        "failed_cell_id": "<string>",
        "failed_cell_type": "database",
        "failed_cell_content": "<string>",
        "error_message": "<string>",
        "database_type": "<ATHENA|REDSHIFT>",
        "connection_name": "<string>",
        "session_type": "<string>",
        "database_configuration": {
            "<config_key>": "<config_value>"
        },
        "database_connection_status": {
            "<status_key>": "<status_value>"
        },
        "database_error_details": {
            "<error_key>": "<error_value>"
        },
        "database_performance_metrics": {
            "<metric_key>": "<metric_value>"
        }
    }

3. Root cause analysis
    3.1 Understand the error
        Determine the specific error message or stack trace that is being reported. Find and understand the failed_cell_id, failed_cell_content, and the notebook file. 
        Record the database metadata including database type, connection name, and session type. Inspect database error details and connection status.
        Review the performance metrics to understand if there are any resource constraints or performance issues.

    3.2 Classify Failure Type
        Use the previous step and try to categorize the failure into one of the following errors:
            1. "connection_error" (e.g., authentication failure, network issues, endpoint unreachable)
            2. "query_syntax_error" (e.g., invalid SQL syntax, undefined columns)
            3. "permission_error" (e.g., insufficient privileges, access denied)
            4. "data_issue" (e.g., missing table, schema mismatch, data type conversion error)
            5. "resource_constraint" (e.g., query timeout, memory limit exceeded)
            6. "configuration_error" (e.g., incorrect database parameters, invalid connection settings)

    3.3 Extract the root cause
        Find the most relevant evidence
        Include: 
            1. A summary of the root cause
            2. The specific component affected (connection, query, permissions, etc.)
            3. Relevant database configuration values involved

    3.4 Generate Fix recommendation
        Suggest configuration updates and/or query updates.
        Use the following fix strategies:
            - Please keep unrelated cells unchanged, only modify the cells with problems with reasons in the comments.
            - For query syntax errors, provide the corrected SQL query with explanations.
            - For permission issues, suggest the required permissions or roles.
            - For connection issues, recommend connection parameter changes or network configuration checks.

4. Constraints
    - Do not guess: if logs are inconclusive, report "unknown" with reasoning
    - Be safe: never suggest reducing security measures or bypassing authentication
    - Be concise: summarize only what's relevant
    - For Athena: consider S3 bucket permissions, query execution time limits, and data format issues
    - For Redshift: consider cluster status, concurrency scaling, WLM configuration, and query optimization
