Borbin the 🐱

C++ std::map using the __m128i type

10 Februar 2025


In a recent project, I encountered a performance bottleneck while using std::map with CString as the key. The keys represented file extensions, each not exceeding seven Unicode characters. Given the performance-critical nature of the loop, the overhead of hashing CString for such short sequences was suboptimal.

To address this, I used the __m128i data type, which is part of the Streaming SIMD Extensions 2 (SSE2) in C++. This data type allows for handling 128-bit wide integer vectors, making it ideal for the file extensions mapping within the 128-bit limit.

To use the __m128i data type, custom hash and equality functions need to be defined for the map.

Using this data type significantly reduced the overhead and improved the performance of the map operations within the critical loop.


Custom hash and equality functions
// Custom hash function for __m128i.
struct Hash128i
{
    std::size_t operator()(const __m128i& key) const
    {
        const uint64_t* data = reinterpret_cast(&key);
        return hash{}(data[0]) ^ hash{}(data[1]);
    }
};

// Custom equality function for __m128i.
struct Equal128i
{
    bool operator()(const __m128i& lhs, const __m128i& rhs) const
    {
        // Compare the __m128i values using integer comparison.
        const __m128i result = _mm_cmpeq_epi32(lhs, rhs);

        // Check if all elements are equal.
        return _mm_movemask_epi8(result) == 0xFFFF;
    }
};


Declaration
unordered_map<__m128i, lpfnFormatGetInstanceProc, Hash128i, Equal128i> registered_format_plugins_map_m128;

The project is using a function pointer as a data type, but it can be really anything.

typedef CPictureFormat* (__stdcall* lpfnFormatGetInstanceProc)();


Map string to the __m128i data type
__m128i CRegisterFormat::str_to_m128i(const WCHAR* obj)
{
    // Converts the first 8 characters of Unicode string obj into a __m128i.
    // Extension includes only a..z and 0..9, and 0..9 is case-insensitive,
    // and is at most 8 characters long.
    const size_t len = wcslen(obj);

    char pointy[16] = { 0 };
    memcpy(pointy, obj, min(16, 2 * len));

    // Initialize __m128i with the char array.
    const __m128i ext = _mm_loadu_si128(reinterpret_cast(pointy));

    // Case insensitve mapping.
    // The extension data is strictly A-Z0-9, so converting them to lowercase can be done by a vectorized operation bitwise OR with 0x20 (obj | 0x20). This moves A-Z to a-z while keeping 0-9, as this range already has this bit set.
    // Create a __m128i variable with all bytes set to 0x20.
    const static __m128i mask = _mm_set1_epi8(0x20);

    // Perform bitwise OR operation on all bytes.
    return _mm_or_si128(ext, mask);
}


Example usage
// Adding a new file extension with the associated function pointer for the file type.
const __m128i key(str_to_m128i(ext));
if(registered_format_plugins_map_m128.find(key) == registered_format_plugins_map_m128.end())
{
    registered_format_plugins_map_m128[key] = fp;
}

// Implement the format factory.
CPictureFormat* CRegisterFormat::GetInstance(const WCHAR* obj)
{
    const WCHAR* ext(wcsrchr(obj, L'.'));
    auto fp = registered_format_plugins_map_m128[str_to_m128i(ext)];

    if (fp)
        return fp();

    return NULL;
}

// Compare two extensions to check if they share the same group defined by matching function pointer.
bool CRegisterFormat::IsDifferentFormat(const WCHAR* obj1, const WCHAR* obj2)
{
    // Get the file extensions.
    const WCHAR* ext1(wcsrchr(obj1, L'.'));
    const WCHAR* ext2(wcsrchr(obj2, L'.'));

    if ((ext1 == NULL) != (ext2 == NULL))
        return true;

    return registered_format_plugins_map_m128[str_to_m128i(ext1)] != registered_format_plugins_map_m128[str_to_m128i(ext2)];
}



Kirkland Piers ᚅ

19 Januar 2025


Sunny Sunday afternoon at the Kirkland Piers.


Interactive Panorama Kirkland Pier 1


1/800s f/5,6 ISO 100/21° f=7,5mm



Hidden, but in plain sight. This pier is next to the popular Marina park, but since the entrance is hidden behind the street, it was empty.
The long shadow (picture #4 in the index) was covered by taking a nadir picture (picture #8 in the index) off center. Almost. One piece of the shadow is still visible at the siding of the pier.

This panorama is the 2025 spring contribution for the event 'hidden' of WorldWidePanorama.


Interactive Panorama Kirkland Pier 2


1/1000s f/5,6 ISO 100/21° f=7,5mm


Sauron is watching you.



1/320s f/6,3 ISO 100/21° 16-50mm f/3,5-6,3 VR f=16mm/24mm




1/320s f/6,3 ISO 100/21° 16-50mm f/3,5-6,3 VR f=16mm/24mm




1/320s f/6,3 ISO 100/21° 16-50mm f/3,5-6,3 VR f=16mm/24mm




PTGui vs HeliconFocus

29 Dezember 2024


Focus stacking involves taking multiple photos at different focus points and merging them into a single image that contains only the sharpest areas.

Three images, each focused on different points -foreground, middle ground, and background- were taken and subsequently merged using PTGui and HeliconFocus.


PTGui

PTGui automatically established control points and precisely aligned all the images, performing as expected for a panoramic image application. Using a simple mask, the three in-focus areas were seamlessly integrated.



HeliconFocus

Since all the images were taken handheld, there are slight variations in perspective. This is evident in the results from HeliconFocus. It is likely that a dedicated focus stacking application expects images to be captured using a tripod for optimal results.



Isn't it surprising how good a panorama stitching app performs compared to a dedicated focus stacking app?


The Making of the test picture


See Combine pictures with PTGui, Focus stacking



Overlake Hospital, Bellevue, WA

21 Dezember 2024


Taken at the 2024 Winter Solstice ♑︎, the shortest day of the year.

This panorama is the 2024 Winter Solstice contribution for the 'December Wrinkle' event of WorldWidePanorama.

Interactive Panorama Overlake Hospital


I took this panorama about an hour before sunset without a tripod while waiting at the main exit. The main challenge was aligning the pathway with the building's geometric facade.
Taking more pictures than necessary helps with panoramas because it allows for better adjustments and alignment.
Using the mask feature in PTGui is another significant advantage. It allows you to force the seamlines into less prominent areas, thereby minimizing visible overlaps and reducing poor alignment.

1/60s f/5,6 ISO 100/21° 7,5mm


Only the central parts of the images were used, which minimized errors. Note the nearly uniform size of the horizontal image parts, with the exception of the pair on the left side of the large building.


At one point, I had this rare PTGui result:



Neuere Beiträge →← Vorherige Beiträge