Borbin the 🐱

  • chatGPT result encoding

    📅 13. Februar 2023 · Software

    chatGPT returns the result as a UTF-8 byte sequence in text form. Anything but ASCII 7-bit chars, for example any extended chars, languages with other scripts, will result in not readable text.


    For example a result returned for the Spanish language:

    ¿Qué habitaciones tienen disponibles?  

    Expected result:

    ¿Qué habitaciones tienes disponibles?


    Result returned for the Japanese language:

    どの部屋が利用可能ですか?  

    Expected result:

    どの部屋が利用可能ですか? 


    You need to read the result as iso-8859-1 encoding and convert as UTF-8.
    For example 'é' gets encoded in UTF-8 as the byte sequence: 0xc3: 'Ã' 0xa9: '©'
    But instead of 'é', chatGPT sends 'é', which is the raw UTF-8 byte sequence.
    The string 'é' is a string sequence of the byte sequence 0xc3 0xa9. To get the correct Unicode string, the string elements needs to be mapped to byte elements.

    [byte[]]$byteContent = [System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($resultText)

    This is done with the iso-8859-1 encoding. This will convert each char into a 8-bit representation, which then can be correctly decoded as UTF-8 to a Unicode string:

    # Run chatGPT query.
    $result = (Invoke-RestMethod @RestMethodParameter)
    
    [string]$resultText = $result.choices[0].text
    [byte[]]$byteContent = [System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($resultText)
    
    # Get the encoded result.
    [string]$text = [System.Text.Encoding]::UTF8.GetString($byteContent)


    Here is a full example on how to use chatGPT in PowerShell:

    # https://platform.openai.com/account/api-keys
    $apikey = "sk-....
    
    <#
    – Model [Required]
    The ChatGPT got multiple models. Each model has its feature, strength point, and use case. You need to select one model to use while building the request. The models are:
    
    text-davinci-003    Most capable GPT-3 model. It can do any task the other models can do, often with higher quality, longer output, and better instruction-following. It also supports inserting completions within the text.
    text-curie-001      Very capable, but faster and lower cost than Davinci.
    text-babbage-001    Capable of straightforward tasks, very fast, and lower cost.
    text-ada-001        Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost
    #>
    
    $requestBody = @{
        prompt      = "What is the capital of Germany?"
        model       = "text-ada-001"
        temperature = 1
        stop        = "."
    } | ConvertTo-Json
    
    $header = @{ 
        Authorization = "Bearer $apikey " 
    }
    
    $restMethodParameter = @{
        Method      = 'Post'
        Uri         = 'https://api.openai.com/v1/completions'
        body        = $requestBody
        Headers     = $header
        ContentType = 'application/json'
    }
    
    # Run chatGPT query.
    $result = (Invoke-RestMethod @restMethodParameter)
    
    [string]$resultText = $result.choices[0].text
    [byte[]]$byteContent = [System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($resultText)
    
    # Get the encoded result.
    [string]$text = [System.Text.Encoding]::UTF8.GetString($byteContent)
  • Scan text with regex in PowerShell

    📅 24. April 2022 · Software

    The named group capture (?exp) in a regex is an easy way to scan content. In this example, to get the text enclosed in quotes in a string. This is how it is done in PowerShell:

    # Get the text enclosed in quotes.
    [string]$text = 'This is an "example text".'
    [string]$textRegex = '\"(?<Text>.*?)\"'
    
    if ($text -match $textRegex) {
        $matches['Text']
    }

    This outputs
    example text


    Or split a formatted string into parts. For example the assignment structure 'id=value':

    # Parse the id and value of the text.
    [string]$text = '  id123 = abc  '
    [string]$idValueRegex = "^\s*(?<id>\w+?)\s*=\s*`"?(?<value>.+?)`"?\s*$"
    
    if ($text -match $idValueRegex) {
        "id=$($matches['id']), value=$($matches['value'])"
    }

    This outputs
    id=id123, value=abc


    Or parse a pattern, for example the content of each bracket in " abc { 123 } { def } 456 {xyz}"

    [string]$text = " abc { 123 } { def } 456 {xyz}"
    [string]$bracketRegex = "[{]\s*(?<Text>.*?)\s*[}]"
    
    ([regex]$bracketRegex).Matches($text) | % {
        [System.Text.RegularExpressions.Group]$match = $_
        [string]$value = $match.Groups["Text"].Value
    
        $value
    }

    This outputs
    123
    def
    xyz

  • Using List in PowerShell

    📅 24. April 2022 · Software

    PowerShell has lots of array and lists support, but changing or creating a list with dynamic data recreate the list on each change which is inefficient for large lists.
    The most simple solution is to use the .NET List class:

        [System.Collections.Generic.List[string]]$content = [System.Collections.Generic.List[string]]::new()
    
        $content.Add("line1")
        $content.Add("line2")
  • Text file encoding with PowerShell

    📅 24. April 2022 · Software

    Text files contain Text with a certain encoding. The usual symbols can be displayed with one byte and encoded as such in the file. But extended chars or other glyphs need more than one byte for representation. The standard for this is Unicode.


    Common Unicode encodings are utf-8 and utf-16.
    utf-8 encodes 7bit chars as it is and is one of the most used formats out there because it results in small file sizes as most text is 7bit anyway. All non 7bit chars are encoded with a sequence.
    utf-16 uses the surrogate pairs to encode char points out of the basic plane, but for most cases it is 2 byte per char. Also known as 'Unicode' with the option for big/little endian order of the byte sequence. The .NET string class is also using utf-16 encoding. As with the file format, don't assume each char is two bytes.


    The PowerShell functions Get-Content and Set-Content need an encoding to properly read/write the file.
    Without any encoding, this loops through all bytes in the text file instead of the encoded chars, and the loop variable is only the byte part of the original encoding and not very useful.

    # No encoding.
    Get-Content $textFile | % { 
        $_
    } | Set-Content $textFileOut


    If the encoding is missing when the file is read, the original text content in utf-8:
    😺abcパワーシェル
    will be stored as this instead:
    😺abcパワーシェル

    # Encoding missing, wrong content in output file.
    Get-Content $textFile | % {
        $_
    } | Set-Content -Encoding UTF8 $textFileOut


    The encoding is needed to properly read the chars in a text file:

    # Read utf-8 file and write as utf-8.
    Get-Content -Encoding UTF8 $textFile | % {
        $_
    } | Set-Content -Encoding UTF8 $textFileOut


    # Read utf-8 file and write as Unicode (utf-16).
    Get-Content -Encoding UTF8 $textFile | % {
        $_
    } | Set-Content -Encoding Unicode $textFileOut


    Note: The Get-Content will read a unicode file even when the utf-8 encoding is used, but it won't read a utf-8 file when the unicode encoding is used. Do not rely on this.
    But when the encoding is not known, it is difficult to use Get-Content. Best practice is to use the ReadLines API from .Net to read any file encoding:

    # Read any file encoding and write as utf-8.
    [System.IO.File]::ReadLines($textFile) | % { 
        $_
    } | Set-Content -Encoding UTF8 $textFileOut


    By default, Set-Content -Encoding UTF8 is not writing a BOM.
    Use the Text.UTF8Encoding to control how if the BOM should be used.
    If the Byte Order Mask (BOM) is not needed, use this to write out as utf-8 without BOM:

    # Read any file encoding and write as utf-8 without BOM.
    [string[]]$contentLines = [System.IO.File]::ReadLines($textFile)
    [Text.UTF8Encoding]$encoding = New-Object System.Text.UTF8Encoding $false
    [IO.File]::WriteAllLines($textFileOut, $contentLines, $encoding)

    If the Byte Order Mask (BOM) is needed, set the first constructor arg of the encoding to $true:

    [Text.UTF8Encoding]$encoding = New-Object System.Text.UTF8Encoding $true


    The ReadLines API does not load all content into memory at once and allow for very large files to be processed line by line. If you need the file in one string, use this:

    # Read text as one string with any file encoding and write as utf-8 without BOM.
    [string]$content = [System.IO.File]::ReadAllText($textFile)
    [Text.UTF8Encoding]$encoding = New-Object System.Text.UTF8Encoding $false
    [IO.File]::WriteAllText($textFileOut, $content, $encoding)


    XML files are also text files using an encoding. Most XML files use utf-8, but if the encoding is different, this commonly used code is not working anymore:

    # Do not use.
    [xml]$xml = Get-Content -Encoding UTF8 $xmlFile


    Use this instead:

    # Read XML file.
    [xml]$xml = New-Object xml
    $xml.Load($xmlFile)


    The default output file encoding is utf-8 with a BOM:

    # Save xml as utf-8 with signature (BOM).
    $xml.Save($xmlFileOut)


    To not write a BOM, use this:

    # Save xml as utf-8 without BOM.
    $encoding = [System.Text.UTF8Encoding]::new($false)
    $writer = [System.IO.StreamWriter]::new($xmlFileOut, $false, $encoding)
    $xml.Save($writer)
    $writer.Dispose()
  • AI code calculus examples

    📅 11. Oktober 2021 · Software

    Example scripts to calculate the integral and zero points of functions using the AI code programmable calculator for Android.

    import."mathlib"
    
    // Math calculus examples
    
    // Store the function in a variable 'fx1' .
    // -e^(x-2)
    { 2 - e swap ^ -1 * } sto.fx1
    
    // Integral
    0    // start
    2    // end
    0,001    // precision
    rcl.fx1  // f(x) as lambda
    integral
    
    // Zero point, uses the function inline.
    -3    // start
    { sto.x rcl.x dup dup * * rcl.x dup * + rcl.x 4 * - 4 - }    // x^3+x^2-4x-4
    nullstelle
    

    See the pre installed scripts for more examples.

← Neuere Beiträge Seite 2 von 6 Ältere Beiträge →
ÜBER

Jürgen E
Principal Engineer, Villager, and the creative mind behind lots of projects:
Windows Photo Explorer (cpicture-blog), Android apps AI code rpn calculator and Stockroom, vrlight, 3DRoundview and my github


Blog-Übersicht Chronologisch

KATEGORIEN

Auto • Fotografie • Motorrad • Paintings • Panorama • Software • Querbeet


Erstellt mit BitBlog!