- What is the best encoding (experiences...) for unicode? - 2 Updates
- About programming.... - 3 Updates
- Why all tutorials/books use non-unicode string? - 6 Updates
- thread interruption points - 3 Updates
- size of a variable - 5 Updates
- mysterious destructors - 1 Update
- OT: New, (post 20th century) M$ compiler question. - 1 Update
- something happened to malloc? - 1 Update
- mysterious destructors - 1 Update
- compiler bug operator>> matching? - 2 Updates
JiiPee <no@notvalid.com>: Feb 22 02:15PM We already started talking about but I will start a new one as this is a separate issue. So what encoding you guys use? UTF-8 or UTF-16? What is the recommendation and your experiences. I read on the web and there was argument whether UTF-8 or UTF-16 was better and they both had strong arguments. But seems like here people prefer UTF-8? And can you please shortly tell how to practically use UTF-8? Like how to get its length, find a certain character, how to store it (well, i guess just a char [] array does the job, or even std::string). Does UTF-8 work with all normal string functions like find, replace etc. If not, how do you deal with these and what needs to be done so they can be used. Say I use Russian letters, how I practically find a certain letter and use all the SDT functions/classes. I am just quite new to this and trying to implement it to my first projects really. So looking for direction. People already gave instructions and I read them, but just asking if there is more. |
"Öö Tiib" <ootiib@hot.ee>: Feb 22 07:34AM -0800 On Sunday, 22 February 2015 16:15:28 UTC+2, JiiPee wrote: > UTF-8? Like how to get its length, find a certain character, how to > store it (well, i guess just a char [] array does the job, or even > std::string). On general case UTF-8 is superior since it: 1) is compatible with ASCII (ASCII text is subset of UTF-8) 2) does not have alignment issues (UTF-16 code point may need to be at even address) 3) does not have endianness issues (UTF-16 may be LE or BE) 4) fits into std::string (std::wstring is unspecified if it is UTF-16 or UTF-32 or something else entirely) 5) is encoding of majority of internet text content UTF-16 may be more convenient on Windows or with Qt. Still if significant part of input or output goes in UTF-8 (I already mentioned internet) then I would pick UTF-8 as internal representation for texts in your application. > Does UTF-8 work with all normal string functions like find, replace etc. You just have to accept that 'char' is a byte (not text character) and 'std::string' is continuous container of such bytes (specific encoding of possible text in it is not guaranteed by it). When that is accepted then everything works. > If not, how do you deal with these and what needs to be done so they can > be used. Say I use Russian letters, how I practically find a certain > letter and use all the SDT functions/classes. Not sure what you mean by "SDT". You have to make sure that when your program receives some text from somewhere then it may need to be converted to UTF-8 or at least checked if it *is* UTF-8 and when your program outputs text to somewhere then it may need to be converted to what is expected at other side (plus inevitable error handling). C++ itself offers too few and inconvenient methods for that so we typically seek help for converting and checking from outside of C++ standard. > projects really. So looking for direction. > People already gave instructions and I read them, but just asking if > there is more. The other tricky thing you eventually stumble upon is that sometimes people expect your program to ignore case of characters or to convert to upper-case or to convert to lower case (or even to title case) and how such things are done may be specific to local traditions. Again the implementations of C++ tend to be quite unhelpful with it. |
"Osmium" <r124c4u102@comcast.net>: Feb 22 06:51AM -0600 "Wouter van Ooijen" wrote: > If you expect me to follow a link to your wonderfull creations I'd first > like a line or two that explains what it will do for me. (That also gives > me a hint of how good you are at expressing yourself efficiently.) Perhaps someone could compute and post the fog index for Ramine's post - that would provide some insight as to just *how* wonderful he is.. |
"Öö Tiib" <ootiib@hot.ee>: Feb 22 05:39AM -0800 On Sunday, 22 February 2015 14:51:21 UTC+2, Osmium wrote: > > me a hint of how good you are at expressing yourself efficiently.) > Perhaps someone could compute and post the fog index for Ramine's post - > that would provide some insight as to just *how* wonderful he is.. It is his Turbo Pascal code that he talks about. It is non-topical in comp.lang.c++. It will never be clear why he posts it here. By posting his rants he has already ramined number of Usenet groups (like comp.programming, comp.programming.threads, alt.comp.lang.borland-delphi, comp.lang.pascal.misc etc.) into his personal blogs. So he is sort of wonderful in ruining low traffic Usenet groups. |
"Osmium" <r124c4u102@comcast.net>: Feb 22 07:53AM -0600 "嘱 Tiib" wrote: > comp.programming, comp.programming.threads, alt.comp.lang.borland-delphi, > comp.lang.pascal.misc etc.) into his personal blogs. So he is sort of > wonderful in ruining low traffic Usenet groups. I dropped comp.programming from my list of groups several months ago, basically the only signal was mostly noise. Looking at the group today I see that Raimine is a big poster there. The name meant nothing to me earlier this morning. |
jt@toerring.de (Jens Thoms Toerring): Feb 21 11:26PM > > on) - but which does the really interesting bits of work that > > make it something people may be motivated to pay for. > I believe you are deranged. Thank you;-) But I don't know how I deserve that distinction. Would you care to elaborate a bit about were you consider me to be completely wrong? Regards, Jens -- \ Jens Thoms Toerring ___ jt@toerring.de \__________________________ http://toerring.de |
Richard Damon <Richard@Damon-Family.org>: Feb 21 06:32PM -0500 On 2/21/15 10:05 AM, JiiPee wrote: > (1 byte)? Why use examples which are not used in real world? This I do > not understand. > And even top C++ people like Bjorn does that. There are some subtle issues with using unicode, which if you are doing a simple tutorial may not be important. Things like what do you mean by the length() function, if you want to know how much storage it take, it works just fine with Unicode. If you want to know how many characters have been displayed, this is actually quite tricky in Unicode (Even using UTF-32/UCS-4 doesn't save you as there are combining code points to allow you to build some glyphs that don't have an assigned code point). It actually turns out that much code written for Ascii, will just work for UTF-8 encoded unicode by following just a few basic guidelines (things like you need to handle the high bit of the character set, which might cause issues with signed char, and you need to manipulate strings at "known" points so you don't break apart multi-byte sequences, and you can't assume that N bytes are N characters). UTF-16 is mostly just used in Microsoft environments, and actually is mostly a mistake. They adopted it when Unicode thought 16 bits were going to be "big enough", and when the changed their mind UTF-16 became an awkward orphan, it normally takes more space than UTF-8, and you still need to worry about multi-unit characters (only the exceptions are much rarer so you might not catch the problem in testing). If I remember right, UTF-16 might make sense in the case of some asian languages, where most "characters" will take 1 unit (2 bytes) in UTF-16, but might take 3 in UTF-8. |
Kai Bojens <kb@kbojens.de>: Feb 22 12:42PM +0100 > And difficult to find a guidelines how to do it. So still searching > (some say use UTF-8 , some UTF-16. but using UTF-8 in a code would make > life difficult as many functions like lenght would not work). A very good starting point: https://github.com/boostcon/cppnow_presentations_2014/blob/master/files/unicode-cpp.pdf |
JiiPee <no@notvalid.com>: Feb 22 12:11PM On 22/02/2015 11:42, Kai Bojens wrote: >> life difficult as many functions like lenght would not work). > A very good starting point: > https://github.com/boostcon/cppnow_presentations_2014/blob/master/files/unicode-cpp.pdf yes, finally examples also (like how to add snowman to a char-string). In many of these sites they have theories but no practical examples... not really a good way to teach things. When I read C++ books the examples tell me almost everything even without knowing the theory!! Thats why I many times read first the examples and after that the theory becouse then I also undertand the theory. Thats what is really needed with these. But lets see if it has enough examples.... it has some.... |
JiiPee <no@notvalid.com>: Feb 22 12:13PM he seems to be from Finland... even that (me also) :) On 22/02/2015 11:42, Kai Bojens wrote: |
"Lőrinczy Zsigmond" <nospam@for.me>: Feb 22 01:52PM +0100 On 2015.02.21. 16:05, JiiPee wrote:> I am trying to learn how to use unicode string.. its not so easy really. > And difficult to find a guidelines how to do it. So still searching > (some say use UTF-8 , some UTF-16. but using UTF-8 in a code would make > life difficult as many functions like lenght would not work). When using utf8, strlen does work, it returns the number of bytes; mbslen returns the number of characters. |
Ian Collins <ian-news@hotmail.com>: Feb 22 05:40PM +1300 Melzzzzz wrote: > Problem is that memory allocators (especially GC) tend to reserve huge > amount of RAM, (not to mention forks) therefore overcommit and OOM > killer.... Not all operating systems are foolhardy enough to allow memory over commit. -- Ian Collins |
Marcel Mueller <news.5.maazl@spamgourmet.org>: Feb 22 09:59AM +0100 On 21.02.15 22.41, Paavo Helde wrote: > random fashion. Even if they are not failing, the computer is pretty much > unusable anyway. Depending on the OS and running programs, a computer > restart might be the best option to come out of the trashing mode. I am You are right. I can confirm this. Win7 discards the disk cache on suspend to disk. This is comparable to swapping after resume. About 1GB of data is read with heavy disk activity in the first few minutes after resume - probably in 4k blocks due to page faults. In this time the system is almost unresponsive and random faults occur from time to time. E.g. drivers that do no longer recognize there devices or program windows can no longer rearrange their Z order (This can happen to any window including simple explorer windows.). Of course, nothing bad happens as long as you have only a few application windows open at suspend and the cache is quite small. So the concept is well designed to survive a feature presentation, no more no less. (What the hell came over them when they decided to discard the cache.) I once run into a similar problem on a Linux VM server too. I started one VM too much and the memory got very low. It was impossible to get a shell to suspend one of the VMs in a reasonable amount of time. So I decided to prefer a hard reset. > starting to think that turning the pagefile off completely might be the > best approach. Unfortunately you have to be careful here. Depending on the OS this might have unwanted side effects. Some OS refuse overcommitment of memory when there is absolutely no swap. Maybe some reliable operating mode intended for cash terminals or something like that. This will likely throw out of memory exceptions very soon when using ordinary desktop applications. Other OS simply ignore your configuration and create a temporary swap file on the system volume in this case. > That's why throw specifications is a deprecated antifeature. Is it? > functions with an empty throw clause should not call anything non- > trivial, if they do there is a large problem between the keyboard and > chair. Well that's the old discussion whether to have checked exceptions or not. Unfortunately when using generic functors or lambdas you have almost no choice. You cannot reasonably use checked exceptions with them, as it would require the throws declaration to be a type argument. (Is this allowed at all?) > the allocation. Alternatively, if the program itself is the memory hog, > then it can probably release a lot of it by stack unwinding (in the > correct stack!), then report a failure. I think it always depend on the individual case. And the basic question is simply who pays to cover all this cases. Probably no-one. > Dynamic memory allocation can be handled relatively well in C++. In the language: yes. In C++ libraries, well, it depends. > overflow is a different beast altogether, there are no standard > mechanisms for dealing with that and most program(mer)s just ignore the > problem and hope they get lucky. Indeed. Marcel |
Paavo Helde <myfirstname@osa.pri.ee>: Feb 22 04:16AM -0600 Marcel Mueller <news.5.maazl@spamgourmet.org> wrote in > On 21.02.15 22.41, Paavo Helde wrote: >> That's why throw specifications is a deprecated antifeature. > Is it? Yes. See Annex D (normative) - Compatibility features: "D.4 Dynamic exception specifications [depr.except.spec] The use of dynamic-exception-specifications is deprecated." Instead, one should use the new C++11 'noexcept' specification. For motivations, see: http://www.open- std.org/jtc1/sc22/wg21/docs/papers/2010/n3051.html Cheers Paavo |
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 16 11:20PM On 16/02/2015 22:51, Christopher Pisz wrote: > Or you're just trolling, since you can't seem to reply in the > appropriate thread, much less quote someone when making false claims > about what they said and didn't say. False claims? Evidence: (It was actually uint8_t not uint16_t) "Calling everything a uint8_t rather than an unsigned long long accomplishes what in 2015?! " "Is it because you want to remind yourself that an unsigned long long is 8 bytes? Is it guaranteed to be 8 bytes by the standard anyway?" Now stfu and gtfo you little liar. /Flibble |
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 16 07:46PM On 16/02/2015 19:15, Christopher Pisz wrote: >> /Flibble > Can you at least go argue your nonsensical points of view in the > appropriate thread? What is nonsense exactly? First you made the mistake of thinking uint16_t was 16 bytes not 16 bits and then you posted a diatribe compounding your mistake rather that admitting to it. Am I wrong? /Flibble |
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 16 07:04PM On 16/02/2015 17:39, Christopher Pisz wrote: >> typedefs? Progress. >> /Flibble > He isn't mixing and matching types here is he? That's a yes then. Next time think before you post tons of bullshit. /Flibble |
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 16 10:49PM On 16/02/2015 22:44, Christopher Pisz wrote: > I said no such thing. > Go quote me in the appropriate post and I will try my best to break it > down to your level of understanding. Either you have memory problems or you are a bald faced liar; either way the evidence is there. /Flibble |
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 16 05:19PM On 16/02/2015 16:30, Christopher Pisz wrote: > instead. Or you can calculate it yourself if you want to use the raw > array by multiplying the size of the type (uint8_t) by the number of > elements. So you've finally accepted that there is nothing wrong with the sized typedefs? Progress. /Flibble |
Mr Flibble <flibbleREMOVETHISBIT@i42.co.uk>: Feb 18 12:55AM On 18/02/2015 00:50, Christopher Pisz wrote: > words: "Rick C. Hodgins", "Flibble" > So, I won't be able to see or respond to any such messages > ----- Mr Flibble is very cross. |
DSF <notavalid@address.here>: Feb 14 01:13AM -0500 On Fri, 13 Feb 2015 18:09:38 -0600, Christopher Pisz >on my shelf, licensed for my lifetime. They are all out to squeeze more >money out of Joe User. Businesses already subscribe to MSDN anyway and >have access to whatever they like. I *HATE* subscription models. Not only for the reasons you mention, but also because you have no choice when it comes to upgrading. If they've removed a feature you use often (but maybe most users don't), or changed how every operation works because they've discovered a "better" way - Tough Luck! You can't fall back on your previous version because it doesn't exist. >At any rate. If you are looking for a new version of visual studio, I'd >say now is not the time to buy, as MSVS 2014 is very near to release. As you noted in your follow-up post, the Community version is free. Hence my disbelief that it is identical to the Pro version, save for licensing. I already have downloaded it, but I am having VirtualBox problems. I will not install it into my main system. "C:" is not my programming drive, "E:" is. The last time I installed a version of VS Express, I set every path I could change to E:\etc... It installed about 20% of itself on E: and the other 80% in C:\Windows and C:\Program Files. And after uninstallation, it left so much of itself on C: that I wound up deleting/reinstalling Windows to get rid of the bloat. By the way, I guess that's another reason I've stuck with the Borland compiler for so long. It stores very little in the Windows directory and, because of that, is fairly portable. Thanks again, DSF "'Later' is the beginning of what's not to be." D.S. Fiscus |
Robert Wessel <robertwessel2@yahoo.com>: Feb 19 11:32AM -0600 On Wed, 18 Feb 2015 09:29:15 +0100, Torsten Mueller >In my application I use also boost (1.57). And boost has an own cstdlib >header. Could this be the reason? >Has anyone had this problem in the last time, malloc is undefined? Could you be seeing a namespace problem? While cstdlib mostly includes stdlin.h, it does put most items into the std: namespace. Perhaps the headers got tightened up recently to not put those functions in both std: and the global namespace? |
ram@zedat.fu-berlin.de (Stefan Ram): Feb 19 04:04AM > o->print(); > o = std::unique_ptr<c>(new c( 2 )); /* overwrite */ > o->print(); } Actually, I wanted to observe how C++ interprets an example someone posted into the C newsgroup recently. Here is my attempt of a translation into C++: #include <iostream> #include <ostream> #include <memory> struct c /* this struct is as above (as before) */ { int v; c( int const x ): v( x ) { ::std::cout << "constructor of instance #" << v << ".\n"; } ~c(){ ::std::cout << "destructor of instance #" << v << ".\n"; } void print(){ ::std::cout << "I am instance #" << v << ".\n"; }}; ::std::unique_ptr< c >f( ::std::unique_ptr< c >* p ) { *p = ::std::make_unique< c >( 2 ); /* <- sequence point! (semicolon) */ return ::std::make_unique< c >( 1 ); } int main() { ::std::unique_ptr< c >o( f( &o )); o->print(); } Does this program violate any rule of C++? |
"Norman J. Goldstein" <normvcr@telus.net>: Feb 14 09:23AM -0800 On 02/13/2015 09:35 PM, Pavel wrote: > (in the OP's example specifically, 3 characters). I am curious about the real > intent, too, though. > -Pavel operator>>( istream&,const char* ) succeeds if the extracted characters exactly match the supplied const char*. Leading white space is ignored. I find this a convenient way to help parse a file. |
Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo>: Feb 14 12:35AM -0500 > your example program even to a string literal, which may > reside in read-only memory)? > Best regards, Jens There can be different implementations of operator>>. For example, one might want to use it to try to extract from the stream and throw away as many characters as the length of the C string pointed to by the const char* parameter (in the OP's example specifically, 3 characters). I am curious about the real intent, too, though. -Pavel |
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page. To unsubscribe from this group and stop receiving emails from it send an email to comp.lang.c+++unsubscribe@googlegroups.com. |
No comments:
Post a Comment