-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Create a CharSetConverter class with both iconv and icu support #74516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-support Author: Abhina Sree (abhina-sree) ChangesThis patch adds a wrapper class called CharSetConverter for ConverterEBCDIC. This class is then extended to support the ICU library or iconv library. The ICU library currently takes priority over the iconv library. Patch is 32.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74516.diff 7 Files Affected:
diff --git a/llvm/cmake/config-ix.cmake b/llvm/cmake/config-ix.cmake
index 7bb3e98333eff..b2505968e430d 100644
--- a/llvm/cmake/config-ix.cmake
+++ b/llvm/cmake/config-ix.cmake
@@ -257,6 +257,22 @@ else()
set(LLVM_ENABLE_TERMINFO 0)
endif()
+#Check for icu.
+find_package(ICU COMPONENTS uc i18n)
+if(ICU_FOUND)
+ set(HAVE_ICU 1)
+else()
+ set(HAVE_ICU 0)
+endif()
+
+# Check for iconv.
+find_package(Iconv)
+if(Iconv_FOUND)
+ set(HAVE_ICONV 1)
+else()
+ set(HAVE_ICONV 0)
+endif()
+
# function checks
check_symbol_exists(arc4random "stdlib.h" HAVE_DECL_ARC4RANDOM)
find_package(Backtrace)
diff --git a/llvm/include/llvm/Config/config.h.cmake b/llvm/include/llvm/Config/config.h.cmake
index fc1f9bf342f8d..74003e1b22494 100644
--- a/llvm/include/llvm/Config/config.h.cmake
+++ b/llvm/include/llvm/Config/config.h.cmake
@@ -281,6 +281,12 @@
/* Have host's ___chkstk_ms */
#cmakedefine HAVE____CHKSTK_MS ${HAVE____CHKSTK_MS}
+/* Define if icu library is available */
+#cmakedefine HAVE_ICU ${HAVE_ICU}
+
+/* Define if iconv library is available */
+#cmakedefine HAVE_ICONV ${HAVE_ICONV}
+
/* Linker version detected at compile time. */
#cmakedefine HOST_LINK_VERSION "${HOST_LINK_VERSION}"
diff --git a/llvm/include/llvm/Support/CharSet.h b/llvm/include/llvm/Support/CharSet.h
new file mode 100644
index 0000000000000..856b3be65ff7e
--- /dev/null
+++ b/llvm/include/llvm/Support/CharSet.h
@@ -0,0 +1,160 @@
+//===-- CharSet.h - Utility class to convert between char sets ----*- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file provides a utility class to convert between different character
+/// set encodings.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_CHARSET_H
+#define LLVM_SUPPORT_CHARSET_H
+
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Config/config.h"
+#include "llvm/Support/ErrorOr.h"
+
+#include <functional>
+#include <string>
+#include <system_error>
+
+namespace llvm {
+
+template <typename T> class SmallVectorImpl;
+
+namespace details {
+class CharSetConverterImplBase {
+public:
+ virtual ~CharSetConverterImplBase() = default;
+
+ /// Converts a string.
+ /// \param[in] Source source string
+ /// \param[in,out] Result container for converted string
+ /// \param[in] ShouldAutoFlush Append shift-back sequence after conversion
+ /// for multi-byte encodings iff true.
+ /// \return error code in case something went wrong
+ ///
+ /// The following error codes can occur, among others:
+ /// - std::errc::argument_list_too_long: The result requires more than
+ /// std::numeric_limits<size_t>::max() bytes.
+ /// - std::errc::illegal_byte_sequence: The input contains an invalid
+ /// multibyte sequence.
+ /// - std::errc::invalid_argument: The input contains an incomplete
+ /// multibyte sequence.
+ ///
+ /// In case of an error, the result string contains the successfully converted
+ /// part of the input string.
+ ///
+
+ virtual std::error_code convert(StringRef Source,
+ SmallVectorImpl<char> &Result,
+ bool ShouldAutoFlush) const = 0;
+
+ /// Restore the conversion to the original state.
+ /// \return error code in case something went wrong
+ ///
+ /// If the original character set or the destination character set
+ /// are multi-byte character sets, set the shift state to the initial
+ /// state. Otherwise this is a no-op.
+ virtual std::error_code flush() const = 0;
+
+ virtual std::error_code flush(SmallVectorImpl<char> &Result) const = 0;
+};
+} // namespace details
+
+// Names inspired by https://wg21.link/p1885.
+namespace text_encoding {
+enum class id {
+ /// UTF-8 character set encoding.
+ UTF8,
+
+ /// IBM EBCDIC 1047 character set encoding.
+ IBM1047
+};
+} // end namespace text_encoding
+
+/// Utility class to convert between different character set encodings.
+/// The class always supports converting between EBCDIC 1047 and Latin-1/UTF-8.
+class CharSetConverter {
+ // details::CharSetConverterImplBase *Converter;
+ std::unique_ptr<details::CharSetConverterImplBase> Converter;
+
+ CharSetConverter(std::unique_ptr<details::CharSetConverterImplBase> Converter)
+ : Converter(std::move(Converter)) {}
+
+public:
+ /// Creates a CharSetConverter instance.
+ /// \param[in] CSFrom name of the source character encoding
+ /// \param[in] CSTo name of the target character encoding
+ /// \return a CharSetConverter instance
+ static CharSetConverter create(text_encoding::id CSFrom,
+ text_encoding::id CSTo);
+
+ /// Creates a CharSetConverter instance.
+ /// Returns std::errc::invalid_argument in case the requested conversion is
+ /// not supported.
+ /// \param[in] CPFrom name of the source character encoding
+ /// \param[in] CPTo name of the target character encoding
+ /// \return a CharSetConverter instance or an error code
+ static ErrorOr<CharSetConverter> create(StringRef CPFrom, StringRef CPTo);
+
+ CharSetConverter(const CharSetConverter &) = delete;
+ CharSetConverter &operator=(const CharSetConverter &) = delete;
+
+ CharSetConverter(CharSetConverter &&Other) {
+ Converter = std::move(Other.Converter);
+ }
+
+ CharSetConverter &operator=(CharSetConverter &&Other) {
+ if (this != &Other)
+ Converter = std::move(Other.Converter);
+ return *this;
+ }
+
+ ~CharSetConverter() = default;
+
+ /// Converts a string.
+ /// \param[in] Source source string
+ /// \param[in,out] Result container for converted string
+ /// \param[in] ShouldAutoFlush Append shift-back sequence after conversion
+ /// for multi-byte encodings.
+ /// \return error code in case something went wrong
+ std::error_code convert(StringRef Source, SmallVectorImpl<char> &Result,
+ bool ShouldAutoFlush = true) const {
+ return Converter->convert(Source, Result, ShouldAutoFlush);
+ }
+
+ char convert(char SingleChar) const {
+ SmallString<1> Result;
+ Converter->convert(StringRef(&SingleChar, 1), Result, false);
+ return Result[0];
+ }
+
+ /// Converts a string.
+ /// \param[in] Source source string
+ /// \param[in,out] Result container for converted string
+ /// \param[in] ShouldAutoFlush Append shift-back sequence after conversion
+ /// for multi-byte encodings iff true.
+ /// \return error code in case something went wrong
+ std::error_code convert(const std::string &Source,
+ SmallVectorImpl<char> &Result,
+ bool ShouldAutoFlush = true) const {
+ return convert(StringRef(Source), Result, ShouldAutoFlush);
+ }
+
+ std::error_code flush() const { return Converter->flush(); }
+
+ std::error_code flush(SmallVectorImpl<char> &Result) const {
+ return Converter->flush(Result);
+ }
+};
+
+} // namespace llvm
+
+#endif
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index b96d62c7a6224..b366b915df719 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -153,6 +153,7 @@ add_llvm_component_library(LLVMSupport
CachePruning.cpp
Caching.cpp
circular_raw_ostream.cpp
+ CharSet.cpp
Chrono.cpp
COM.cpp
CodeGenCoverage.cpp
@@ -291,6 +292,22 @@ add_llvm_component_library(LLVMSupport
Demangle
)
+# Link icu library if it is an external library.
+if(ICU_FOUND)
+ target_link_libraries(LLVMSupport
+ PRIVATE
+ ${ICU_LIBRARIES}
+ )
+else()
+ # Link iconv library if it is an external library.
+ if(Iconv_FOUND AND NOT Iconv_IS_BUILT_IN)
+ target_link_libraries(LLVMSupport
+ PRIVATE
+ ${Iconv_LIBRARIES}
+ )
+ endif()
+endif()
+
set(llvm_system_libs ${system_libs})
# This block is only needed for llvm-config. When we deprecate llvm-config and
diff --git a/llvm/lib/Support/CharSet.cpp b/llvm/lib/Support/CharSet.cpp
new file mode 100644
index 0000000000000..dbc2cb7c1839d
--- /dev/null
+++ b/llvm/lib/Support/CharSet.cpp
@@ -0,0 +1,370 @@
+//===-- CharSet.cpp - Utility class to convert between char sets --*- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file provides utility classes to convert between different character
+/// set encoding.
+///
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/CharSet.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Support/ConvertEBCDIC.h"
+#include "llvm/Support/raw_ostream.h"
+#include <algorithm>
+#include <limits>
+#include <system_error>
+
+#ifdef HAVE_ICU
+#include <unicode/ucnv.h>
+#elif defined(HAVE_ICONV)
+#include <iconv.h>
+#endif
+
+using namespace llvm;
+
+// Normalize the charset name with the charset alias matching algorithm proposed
+// in https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching.
+void normalizeCharSetName(StringRef CSName, SmallVectorImpl<char> &Normalized) {
+ bool PrevDigit = false;
+ for (auto Ch : CSName) {
+ if (isAlnum(Ch)) {
+ Ch = toLower(Ch);
+ if (Ch != '0' || PrevDigit) {
+ PrevDigit = isDigit(Ch);
+ Normalized.push_back(Ch);
+ }
+ }
+ }
+}
+
+// Maps the charset name to enum constant if possible.
+std::optional<text_encoding::id> getKnownCharSet(StringRef CSName) {
+ SmallString<16> Normalized;
+ normalizeCharSetName(CSName, Normalized);
+#define CSNAME(CS, STR) \
+ if (Normalized.equals(STR)) \
+ return CS
+ CSNAME(text_encoding::id::UTF8, "utf8");
+ CSNAME(text_encoding::id::IBM1047, "ibm1047");
+#undef CSNAME
+ return std::nullopt;
+}
+
+namespace {
+enum ConversionType {
+ UTFToIBM1047,
+ IBM1047ToUTF,
+};
+
+// Support conversion between EBCDIC 1047 and UTF8. This class uses
+// built-in translation tables that allow for translation between the
+// aforementioned character sets. The use of tables for conversion is only
+// possible because EBCDIC 1047 is a single-byte, stateless encoding; other
+// character sets are not supported.
+class CharSetConverterTable : public details::CharSetConverterImplBase {
+ ConversionType ConvType;
+
+public:
+ CharSetConverterTable(ConversionType ConvType) : ConvType(ConvType) {}
+
+ std::error_code convert(StringRef Source, SmallVectorImpl<char> &Result,
+ bool ShouldAutoFlush) const override;
+ std::error_code flush() const override;
+ std::error_code flush(SmallVectorImpl<char> &Result) const override;
+};
+
+std::error_code CharSetConverterTable::convert(StringRef Source,
+ SmallVectorImpl<char> &Result,
+ bool ShouldAutoFlush) const {
+ if (ConvType == IBM1047ToUTF) {
+ ConverterEBCDIC::convertToUTF8(Source, Result);
+ return std::error_code();
+ } else if (ConvType == UTFToIBM1047) {
+ return ConverterEBCDIC::convertToEBCDIC(Source, Result);
+ }
+ llvm_unreachable("Invalid ConvType!");
+ return std::error_code();
+}
+
+std::error_code CharSetConverterTable::flush() const {
+ return std::error_code();
+}
+
+std::error_code
+CharSetConverterTable::flush(SmallVectorImpl<char> &Result) const {
+ return std::error_code();
+}
+
+#ifdef HAVE_ICU
+class CharSetConverterICU : public details::CharSetConverterImplBase {
+ UConverter *FromConvDesc;
+ UConverter *ToConvDesc;
+
+public:
+ CharSetConverterICU(UConverter *Converter) {
+ UErrorCode EC = U_ZERO_ERROR;
+ FromConvDesc = nullptr;
+ ToConvDesc = ucnv_safeClone(Converter, nullptr, nullptr, &EC);
+ if (U_FAILURE(EC)) {
+ ToConvDesc = nullptr;
+ }
+ };
+
+ CharSetConverterICU(UConverter *FromConverter, UConverter *ToConverter) {
+ UErrorCode EC = U_ZERO_ERROR;
+ FromConvDesc = ucnv_safeClone(FromConverter, nullptr, nullptr, &EC);
+ if (U_FAILURE(EC))
+ FromConvDesc = nullptr;
+ ToConvDesc = ucnv_safeClone(ToConverter, nullptr, nullptr, &EC);
+ if (U_FAILURE(EC))
+ ToConvDesc = nullptr;
+ }
+
+ std::error_code convert(StringRef Source, SmallVectorImpl<char> &Result,
+ bool ShouldAutoFlush) const override;
+ std::error_code flush() const override;
+ std::error_code flush(SmallVectorImpl<char> &Result) const override;
+};
+
+std::error_code CharSetConverterICU::convert(StringRef Source,
+ SmallVectorImpl<char> &Result,
+ bool ShouldAutoFlush) const {
+ // Setup the output. We directly write into the SmallVector.
+ size_t OutputLength, Capacity = Result.capacity();
+ char *Output, *Out;
+
+ UErrorCode EC = U_ZERO_ERROR;
+
+ auto HandleError = [&Capacity, &Output, &OutputLength,
+ &Result](UErrorCode UEC) {
+ if (UEC == U_BUFFER_OVERFLOW_ERROR &&
+ Capacity < std::numeric_limits<size_t>::max()) {
+ // No space left in output buffer. Double the size of the underlying
+ // memory in the SmallVectorImpl, adjust pointer and length and continue
+ // the conversion.
+ Capacity = (Capacity < std::numeric_limits<size_t>::max() / 2)
+ ? 2 * Capacity
+ : std::numeric_limits<size_t>::max();
+ Result.resize_for_overwrite(Capacity);
+ Output = static_cast<char *>(Result.data());
+ OutputLength = Capacity;
+ return std::error_code();
+ } else {
+ // Some other error occured.
+ return std::error_code(errno, std::generic_category());
+ }
+ };
+
+ do {
+ EC = U_ZERO_ERROR;
+ size_t InputLength = Source.size();
+ const char *Input =
+ InputLength ? const_cast<char *>(Source.data()) : nullptr;
+ const char *In = Input;
+ Output = InputLength ? static_cast<char *>(Result.data()) : nullptr;
+ OutputLength = Capacity;
+ Out = Output;
+ Result.resize_for_overwrite(Capacity);
+ ucnv_convertEx(ToConvDesc, FromConvDesc, &Output, Out + OutputLength,
+ &Input, In + InputLength, /*pivotStart=*/NULL,
+ /*pivotSource=*/NULL, /*pivotTarget=*/NULL,
+ /*pivotLimit=*/NULL, /*reset=*/true, /*flush=*/true, &EC);
+ if (U_FAILURE(EC)) {
+ if (auto error = HandleError(EC))
+ return error;
+ } else if (U_SUCCESS(EC))
+ break;
+ } while (U_FAILURE(EC));
+
+ Result.resize(Output - Out);
+ return std::error_code();
+}
+
+std::error_code CharSetConverterICU::flush() const { return std::error_code(); }
+
+std::error_code
+CharSetConverterICU::flush(SmallVectorImpl<char> &Result) const {
+ return std::error_code();
+}
+
+#elif defined(HAVE_ICONV)
+class CharSetConverterIconv : public details::CharSetConverterImplBase {
+ iconv_t ConvDesc;
+
+public:
+ CharSetConverterIconv(iconv_t ConvDesc) : ConvDesc(ConvDesc) {}
+
+ std::error_code convert(StringRef Source, SmallVectorImpl<char> &Result,
+ bool ShouldAutoFlush) const override;
+ std::error_code flush() const override;
+ std::error_code flush(SmallVectorImpl<char> &Result) const override;
+};
+
+std::error_code CharSetConverterIconv::convert(StringRef Source,
+ SmallVectorImpl<char> &Result,
+ bool ShouldAutoFlush) const {
+ // Setup the input. Use nullptr to reset iconv state if input length is zero.
+ size_t InputLength = Source.size();
+ char *Input = InputLength ? const_cast<char *>(Source.data()) : nullptr;
+ // Setup the output. We directly write into the SmallVector.
+ size_t Capacity = Result.capacity();
+ Result.resize_for_overwrite(Capacity);
+ char *Output = InputLength ? static_cast<char *>(Result.data()) : nullptr;
+ size_t OutputLength = Capacity;
+
+ size_t Ret;
+
+ // Handle errors returned from iconv().
+ auto HandleError = [&Capacity, &Output, &OutputLength, &Result](size_t Ret) {
+ if (Ret == static_cast<size_t>(-1)) {
+ // An error occured. Check if we can gracefully handle it.
+ if (errno == E2BIG && Capacity < std::numeric_limits<size_t>::max()) {
+ // No space left in output buffer. Double the size of the underlying
+ // memory in the SmallVectorImpl, adjust pointer and length and continue
+ // the conversion.
+ const size_t Used = Capacity - OutputLength;
+ Capacity = (Capacity < std::numeric_limits<size_t>::max() / 2)
+ ? 2 * Capacity
+ : std::numeric_limits<size_t>::max();
+ Result.resize_for_overwrite(Capacity);
+ Output = static_cast<char *>(Result.data()) + Used;
+ OutputLength = Capacity - Used;
+ return std::error_code();
+ } else {
+ // Some other error occured.
+ return std::error_code(errno, std::generic_category());
+ }
+ } else {
+ // A positive return value indicates that some characters were converted
+ // in a nonreversible way, that is, replaced with a SUB symbol. Returning
+ // an error in this case makes sure that both conversion routines behave
+ // in the same way.
+ return std::make_error_code(std::errc::illegal_byte_sequence);
+ }
+ };
+
+ // Convert the string.
+ while ((Ret = iconv(ConvDesc, &Input, &InputLength, &Output, &OutputLength)))
+ if (auto EC = HandleError(Ret))
+ return EC;
+ if (ShouldAutoFlush) {
+ while ((Ret = iconv(ConvDesc, nullptr, nullptr, &Output, &OutputLength)))
+ if (auto EC = HandleError(Ret))
+ return EC;
+ }
+
+ // Re-adjust size to actual size.
+ Result.resize(Capacity - OutputLength);
+ return std::error_code();
+}
+
+std::error_code CharSetConverterIconv::flush() const {
+ size_t Ret = iconv(ConvDesc, nullptr, nullptr, nullptr, nullptr);
+ if (Ret == static_cast<size_t>(-1)) {
+ return std::error_code(errno, std::generic_category());
+ }
+ return std::error_code();
+}
+
+std::error_code
+CharSetConverterIconv::flush(SmallVectorImpl<char> &Result) const {
+ char *Output = Result.data();
+ size_t OutputLength = Result.capacity();
+ size_t Capacity = Result.capacity();
+ Result.resize_for_overwrite(Capacity);
+
+ // Handle errors returned from iconv().
+ auto HandleError = [&Capacity, &Output, &OutputLength, &Result](size_t Ret) {
+ if (Ret == static_cast<size_t>(-1)) {
+ // An error occured. Check if we can gracefully handle it.
+ if (errno == E2BIG && Capacity < std::numeric_limits<size_t>::max()) {
+ // No space left in output buffer. Increase the size of the underlying
+ // memory in the SmallVectorImpl by 2 bytes, adjust pointer and length
+ // and continue the conversion.
+ const size_t Used = Capacity - OutputLength;
+ Capacity = (Capacity < std::numeric_limits<size_t>::max() - 2)
+ ? 2 + Capacity
+ : std::numeric_limits<size_t>::max();
+ Result.resize_for_overwrite(Capacity);
+ Output = static_cast<char *>(Result.data()) + Used;
+ OutputLength = Capacity - Used;
+ return std::error_code();
+ } else {
+ // Some other error occured.
+ return std::error_code(errno, std::generic_category());
+ }
+ } else {
+ // A positive return value indicates that some characters were converted
+ // in a nonreversible way, that is, replaced with a SUB symbol. Returning
+ // an error in this case makes sure that both conversion routines behave
+ // in the same way.
+ return std::make_error_code(std::errc::illegal_byte_sequence);
+ }
+ };
+
+ size_t Ret;
+ while ((Ret = iconv(ConvDesc, nullptr, nullptr, &Output, &OutputLength)))
+ if (auto EC = HandleError(Ret))
+ return EC;
+
+ // Re-adjust size to actual size.
+ Result.resize(Capacity - OutputLength);
+ return std::error_code();
+}
...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Adding some more reviewers to cover cmake changes, packaging questions, and folks who expressed opinions on the RFC. |
A link back to the RFC for those playing along at home: https://discourse.llvm.org/t/rfc-adding-a-charset-converter-to-the-llvm-support-library/69795 |
Thank you, that is the link to the first RFC. A continuation of that discussion can also be found here on my RFC for implementing the fexec-charset option https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks working on this.
I do believe the patch implements the resolution of the RFC and generally looks good.
if (U_FAILURE(EC)) { | ||
return std::error_code(errno, std::generic_category()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether that diagnostic is sufficient for higher level concerns.
But maybe it is, amd we can instead have an additional interface that list the available encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember there was a comment on my old patch about checking whether we can actually create a converter when we use the fexec-charset option and emitting an error in the driver. Maybe that can be an option as well
ErrorOr<CharSetConverter> CharSetConverter::create(StringRef CSFrom, | ||
StringRef CSTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be better off taking const std::string &
here, given we need a null termination in the common case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you saw my fexec-charset PR abhina-sree#1, but its mostly used in LiteralSupport.cpp where there is usually no null terminator when we are doing the translation.
2d735d9
to
6e0b77b
Compare
cf6559f
to
0ef0947
Compare
ping :) |
llvm/cmake/config-ix.cmake
Outdated
@@ -257,6 +257,26 @@ else() | |||
set(LLVM_ENABLE_TERMINFO 0) | |||
endif() | |||
|
|||
#Check for icu. | |||
if(LLVM_ENABLE_ICU) | |||
find_package(ICU COMPONENTS uc i18n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512/30 distinguishes between statically and dynamically linking against ICU; which one does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will need more work for this, thanks for catching! In my local testing I only had the shared library but I think the current implementation will allow static linking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into implementing this by setting the following cmake variable CMAKE_FIND_LIBRARY_SUFFIXES so that it will only find libraries ending in .so for ICU. Please let me know if there is a better way to do this. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of anything better.
This won't work on Windows.
0ef0947
to
e2fc524
Compare
e2fc524
to
9d0371a
Compare
ping :) |
llvm/cmake/config-ix.cmake
Outdated
@@ -257,6 +257,26 @@ else() | |||
set(LLVM_ENABLE_TERMINFO 0) | |||
endif() | |||
|
|||
#Check for icu. | |||
if(LLVM_ENABLE_ICU) | |||
find_package(ICU COMPONENTS uc i18n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of anything better.
This won't work on Windows.
9d0371a
to
00bbc15
Compare
ping :) |
Co-authored-by: Eli Friedman <[email protected]>
Remove comment that looks like code (unique_ptr should be easy enough to understand). Co-authored-by: Hubert Tong <[email protected]>
…y check for iconv if icu was not found
734c25a
to
911fa31
Compare
911fa31
to
cfd8e5d
Compare
Hi @cor3ntin , @efriedma-quic, @tahonermann, would you be able to review this patch again? |
…38893) This patch adds a wrapper class called EncodingConverter for ConverterEBCDIC. This class is then extended to support the ICU library or iconv library. The ICU library currently takes priority over the iconv library. Relevant RFCs: https://discourse.llvm.org/t/rfc-adding-a-charset-converter-to-the-llvm-support-library/69795 https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512 Stacked PR to enable fexec-charset that depends on this: #138895 See old PR for review and commit history: #74516
…upport. (#138893) This patch adds a wrapper class called EncodingConverter for ConverterEBCDIC. This class is then extended to support the ICU library or iconv library. The ICU library currently takes priority over the iconv library. Relevant RFCs: https://discourse.llvm.org/t/rfc-adding-a-charset-converter-to-the-llvm-support-library/69795 https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512 Stacked PR to enable fexec-charset that depends on this: llvm/llvm-project#138895 See old PR for review and commit history: llvm/llvm-project#74516
This patch adds a wrapper class called CharSetConverter for ConverterEBCDIC. This class is then extended to support the ICU library or iconv library. The ICU library currently takes priority over the iconv library.
Relevant RFCs:
https://discourse.llvm.org/t/rfc-adding-a-charset-converter-to-the-llvm-support-library/69795
https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512
PR to enable fexec-charset that depends on this:
abhina-sree#1