- Simple C++ Source Parser? - 9 Updates
- #ifdef with enum member - 1 Update
- enum class or enum? - 2 Updates
- who's at fault, me or compiler? - 3 Updates
- Conditional compilation - struct contains member - 1 Update
Mike Copeland <mrc2323@cox.net>: Jul 19 04:34PM -0700 I am working on a C++ source file analyzer; I had one that worked for C sources. That program was written many years ago, and I'm attempting to update it for C++ code, as well as use C++ structures and features. The code for parsing C++ code is tedious, and I'm looking for a library or functional code that will (1) parse non-comment code elements and (2) return token strings. Is there something I can link to/use that will help me? I've done some Google searching and have seen references to Clang, Elsa, Metre and ANTLR - all of which seem much more than I need. I just want source code tokens and to know which source code line they're from. TIA -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
Ian Collins <ian-news@hotmail.com>: Jul 20 11:41AM +1200 On 20/07/2020 11:34, Mike Copeland wrote: > some Google searching and have seen references to Clang, Elsa, Metre and > ANTLR - all of which seem much more than I need. I just want source > code tokens and to know which source code line they're from. https://clang.llvm.org/doxygen/classclang_1_1Parser.html -- Ian. |
Sam <sam@email-scan.com>: Jul 19 08:41PM -0400 Mike Copeland writes: > ANTLR - all of which seem much more than I need. I just want source > code tokens and to know which source code line they're from. > TIA Maybe for some subset of C++ grammar one could come up with a "Simple C++ Source Parser". But in order to parse the full C++ syntax, especially c++11 and higher, that's a project of a lifetime, I'm afraid. |
"Öö Tiib" <ootiib@hot.ee>: Jul 19 11:35PM -0700 On Monday, 20 July 2020 02:34:14 UTC+3, Mike Copeland wrote: > some Google searching and have seen references to Clang, Elsa, Metre and > ANTLR - all of which seem much more than I need. I just want source > code tokens and to know which source code line they're from. On general case it is impossible to make simple parser for logical analyzing of C++ code since its grammar is large and full of complications. The simplest parsers are made for syntax highlighting or automatic reformatting, but such are usually uninterested in meaning of code so results are not suitable for substantive analysis. Example ... Artistic Style. There are bit more aware parsers made for automatic documenting. Those are more complex. You will get bit better tagged results from such parsers. Additionally those parsers tend to pay lot of attention to contents of comments but you can ignore that aspect. Example ... Doxygen. Even those things are relatively far from trivial but since we do not know your goals and you say Clang is more than you need ... then perhaps try. |
Scott Newman <scott69@gmail.com>: Jul 20 08:42AM +0200 C++ is only a bit more complicated to parse than C. Try to write a parser on your own. |
David Brown <david.brown@hesbynett.no>: Jul 20 10:49AM +0200 On 20/07/2020 01:34, Mike Copeland wrote: > ANTLR - all of which seem much more than I need. I just want source > code tokens and to know which source code line they're from. > TIA I think it is unlikely that you'll get far without using a big project. Parsing C++ has got more and more difficult - there are more new syntaxes, context-dependent keywords, even a new operator in the latest version. I would recommend you look again at existing parsers, and see if you can learn to use them. It might take you time to get the hang of clang as a parser, but that's a job you do once - and then you can take advantage of all the work they do and you don't have to update or re-write things for each new C++ version. clang is /designed/ to be usable as a library, and as a parser, for syntax highlighting in IDE's, for making static analysers, for JIT compilation, and other tools. I don't know the other tools you mentioned, but I personally would definitely concentrate on clang first. I'd start with the existing clang analyser, and see where that could take me - that could be a very good starting point for adding the new analysers that interest you. (gcc might also be worth a look these days. There is an analyser framework in the latest version, there is support for plugins that can get access to parsed source information for checking, with existing plugins for other kinds of static or style checking. There is even a project underway for making a JIT compiler library of gcc. I don't think gcc is as far down this path as clang, but maybe it is of use.) |
Paavo Helde <eesnimi@osa.pri.ee>: Jul 20 12:55PM +0300 20.07.2020 02:34 Mike Copeland kirjutas: > ANTLR - all of which seem much more than I need. I just want source > code tokens and to know which source code line they're from. > TIA If you just want tokens without any knowledge what they mean, then this should be pretty straightforward, the C++ preprocessor does exactly that: removes comments and outputs token strings. It also helpfully adds extra spaces between tokens which would otherwise appear glued together. It also outputs file names and line numbers so keeping track of line numbers should be easy. So if I was given this task, I would start with getting my toolchain to output preprocessed source files instead of object files. In preprocessed source, extracting tokens is simple in general, except for string literals and especially raw string literals which are a bit more tricky. Beware though that tokens without meaning do not give you much. If all you know is that there is a token 'final' on line 5095, it does not even tell you if this is a C++ keyword or some other name, not to speak about in which scope, namespace or class it belongs to. Also, if there are different preprocessor branches, only one of them survives after preprocessing step. |
Keith Thompson <Keith.S.Thompson+u@gmail.com>: Jul 20 10:53AM -0700 Paavo Helde <eesnimi@osa.pri.ee> writes: [...] > you know is that there is a token 'final' on line 5095, it does not > even tell you if this is a C++ keyword or some other name, not to > speak about in which scope, namespace or class it belongs to. If you see 'final' on line 5095, you can be sure it's not a C++ keyword. -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Philips Healthcare void Void(void) { Void(); } /* The recursive call of the void */ |
James Kuyper <jameskuyper@alumni.caltech.edu>: Jul 20 06:42PM -0400 On 7/20/20 1:53 PM, Keith Thompson wrote: >> even tell you if this is a C++ keyword or some other name, not to >> speak about in which scope, namespace or class it belongs to. > If you see 'final' on line 5095, you can be sure it's not a C++ keyword. For the benefit of those who don't already know what Keith is referring to: the C++ standard does not describe either override or final as keywords. Instead, what it says about them is: "The identifiers in Table 4 have a special meaning when appearing in a certain context. When referred to in the grammar, these identifiers are used explicitly rather than using the identifier grammar production. Unless otherwise specified, any ambiguity as to whether a given identifier has a special meaning is resolved to interpret the token as a regular identifier." Note that this is a meaningful distinction. Keywords can never be used as regular identifiers - they are always parsed as keywords, and if they appear in a place where that keyword is not permitted, it's a syntax error. When used anywhere other the specific context where they're referred to in the grammar, override and final can be used as ordinary identifiers. In principle, the part about ambiguities being resolved in favor of the regular identifier is another difference, but after a careful review of all of the relevant grammar rules, I can't figure out any way to create such an amibiguity - I may have missed something. |
Frederick Gotham <cauldwell.thomas@gmail.com>: Jul 20 02:31PM -0700 Let me start off with this easy example: #define MONKEY 5 #ifdef MONKEY int Func(void) { return MONKEY; } #else int Func(void) { return 0; }
Subscribe to:
Post Comments (Atom)
|
No comments:
Post a Comment