Sintaksis Menyoroti untuk Golf yang Lebih Besar!

14

Pegolf.

Bersama-sama, kami telah bersatu untuk menghasilkan kode yang ringkas, fungsional indah dan lebih jelek daripada Phantom of the Opera dari novel asli.

Waktunya telah tiba bagi kita untuk membawa keindahan kembali ke dunia pemrograman. Dengan warna. Dengan cara yang ringkas, fungsional cantik lebih jelek daripada Phantom of the Opera dari novel aslinya.

Kita akan mengkodekan stabilo sintaks yang berwarna-warni. Dalam jumlah kode sesingkat mungkin.

Anda akan menerima melalui file input atau Stdin file C yang valid. File C akan menggunakan konvensi baris pilihan Anda, dan hanya akan berisi karakter ASCII 32-126. Anda harus mengubahnya menjadi file HTML yang menampilkan dengan benar setidaknya di Chrome, yang menunjukkan kode sumber dengan penyorotan sintaksis. Outputnya mungkin dalam file atau ke Stdout.

Anda harus menyoroti:

  • Semua string dan karakter (termasuk karakter kutipan) dalam warna Hijau (# 00FF00). String dapat berisi karakter yang lolos.

  • Semua kata C dilindungi warna Biru (# 0000FF).

  • Semua komentar dalam warna Kuning (# FFFF00).

  • Semua arahan preprosesor C dalam Pink (# FF00FF).

Output saat ditampilkan di Chrome harus:

  • Berada dalam font dengan lebar tetap

  • Tampilkan baris baru di mana pun mereka muncul di sumber asli

  • Secara akurat mereproduksi spasi putih. Karakter tab harus dianggap sebagai 4 spasi.

Bonus

  • x 0,9 jika Anda memasukkan nomor baris. Nomor baris harus dapat mencapai setidaknya 99999. Semua sumber harus tetap selaras - jadi, kode sumber dengan nomor baris yang lebih kecil masih harus dimulai pada posisi yang sama dengan kode sumber dengan nomor baris yang lebih tinggi

  • x 0,8 jika latar belakang setiap baris berganti-ganti antara abu-abu terang (# C0C0C0) dan putih (#FFFFFF)

  • x 0,9 jika kode sumber Anda ditulis dalam C, dan dapat memformat dirinya dengan benar.

Mencetak gol

Ini golf kode. Skor Anda adalah jumlah byte kode sumber Anda dikalikan dengan bonus apa pun. Pemenangnya adalah pegolf dengan skor terendah.

lochok
sumber
1
@Mengganggu perilaku standar IDE yang lebih besar adalah, bahwa komentar tidak akan disorot lebih lanjut, yang berarti komentar tetap komentar. belum banyak bekerja dengan arahan preprocessor, tetapi wikipedia juga di sini tidak menyoroti kode lebih lanjut ... juga string adalah string, mereka tidak memerlukan evaluasi lebih lanjut ...
Vogel612
1
Mengonfirmasi apa yang dikatakan Vogel612 adalah perilaku yang benar
lochok
2
"valid setidaknya di Chrome", atau "ditampilkan dengan benar setidaknya di Chrome"?
John Dvorak
3
@Rimsty, Tunggu sebentar, aku idiot. xD Nevermind.
cjfaure
4
Warna terburuk yang pernah ada!
Greg

Jawaban:

8

Perl 769 karakter * 0,9 * 0,8 = 554

Mungkin masih ada beberapa perbaikan yang harus dilakukan pada beberapa regex, tetapi perlahan-lahan sampai di sana!

$_=join"",<>;$s="<tt class";$c="</tt>";$d=counter;$e=color;s/\t/    /g;s!<!&lt;!g;s!>!&gt;!g;s!^#.+(?=$|
)!$s=d>$&$c!gm;s!//.+!$s=c>$&$c!g;s|(['"]).*?(?<!\\)(\\\\)*\1|($h=$&)=~s!/!&#47;!g;"$s=s>$h$c"|smeg;s!/\*.*?\*/!$s=c>$&$c!smg;s!\b(_Packed|(au|go)to|break|c(ase|har|onst|ontinue)|d(efault|o|ouble)|e(lse|num|xtern)|f(loat|or)|if|int|long|re(gister|turn)|short|(un)?signed|s(izeof|tatic|truct|witch)|typedef|union|vo(id|latile)|while)\b!$s=r>$&$c!g;s!=(\w)>.+?$c!join"$c
$s=$1>",split$/,$&!smeg;s!
!<tr><td>!g;print"<style>body{font:10px monospace;$d-reset:n}td{white-space:pre}tr:nth-child(even){background:#c0c0c0}tr:before{$d-increment:n;content:$d(n)}.d{$e:#f0f}.r{$e:#00f}.c{$e:#ff0}.s{$e:#0f0}tt tt{$e:inherit!important}</style><table cellspacing=0><tr><td>$_"

Versi yang sedikit kurang dikaburkan dengan komentar:

$_=join"",<>; # slurp file
$s="<tt class"; # used later - use <tt/> instead of <span/>, fewer chars!
$c="</tt>";
$d=counter;
$e=color;
s/\t/    /g; # convert tabs to spaces
s!<!&lt;!g; # htmlentity < and >
s!>!&gt;!g;
s!^#.+(?=$|\n)!$s=d>$&$c!gm; # directives
s!//.+!$s=c>$&$c!g; # inline comments
s|(['"]).*?(?<!\\)(\\\\)*\1|($h=$&)=~s!/!&#47;!g;"$s=s>$h$c"|smeg; # strings, might have 0 length - thanks @Einacio; work-around string that contain /* by converting them to HTML entities
s!/\*.*?\*/!$s=c>$&$c!smg; # multi-line comments
s!\b(_Packed|(au|go)to|break|c(ase|har|onst|ontinue)|d(efault|o|ouble)|e(lse|num|xtern)|f(loat|or)|if|int|long|re(gister|turn)|short|(un)?signed|s(izeof|tatic|truct|witch)|typedef|union|vo(id|latile)|while)\b!$s=r>$&$c!g; # reserved words, don't optimise this too much! - thanks @bwoebi
s!=(\w)>.+?$c!join"$c
$s=$1>",split$/,$&!smeg; # any multi-line string/comment, ensure <span/>s are repeated
s!\n!<tr><td>!g; # strip newlines, replace with <tr><td>, don't need </td> _or_ </tr> - thanks @xfix!
print"<style>
body{font:10px monospace;$d-reset:n} /* init counter */
td{white-space:pre} /* preserve whitespace */
tr:nth-child(odd){background:#c0c0c0} /* alternating rows */
tr:before{$d-increment:n;content:$d(n)} /* place counter */
.d{$e:#f0f} /* highlights */
.r{$e:#00f}
.c{$e:#ff0}
.s{$e:#0f0}
tt tt{$e:inherit!important} /* ignore reserved words/comments in strings */
</style><table cellspacing=0><tr><td>$_"

Sekarang berhasil menyoroti entri @ xfix.

Meminjam gagasan untuk </tr>keluar dari entri @ xfix, terima kasih!

Contoh output untuk solusi @ xfix .

Dom Hastings
sumber
1
Solusi Anda tidak dapat menyoroti solusi saya dengan benar, karena itu tidak luput dari markup HTML, dan tidak menampilkan spasi dengan benar.
Konrad Borowski
1
Di HTML5, </tr>dan </td>sepenuhnya opsional, jadi saya mengabaikannya.
Konrad Borowski
1
Anda sebenarnya dapat menyimpan beberapa karakter dengan terkadang menggunakan nama kata kunci lengkap di regex. Misalnya if|intsatu karakter kurang dari i(f|nt). Atau d(efault|o|ouble)char lain kurang dari d(efault|o(uble)?).
bwoebi
1
pada biola '\\' pada baris 61 tidak ditandai sebagai string. kesalahan contoh atau kasus tepi gagal?
Einacio
1
Kode masih akan berfungsi jika Anda memindahkan <style>blok ke ujung dan menghilangkan tag akhir. Anda kemudian dapat menghilangkan yang terakhir }dari gaya. Tentu, ini sepenuhnya tidak valid, tetapi bekerja di Chrome!
user2428118
7

C - 1605 1200 karakter * 0,9 * 0,8 * 0,9 = 777 karakter

Pasti terlalu lama, tapi terserahlah. 264 digunakan oleh daftar kata kunci itu sendiri. Versi long one liner. Tidak menggunakan alokasi memori, sehingga penggunaan memori sangat rendah (dan semuanya bersifat global, sehingga tumpukan tidak benar-benar digunakan). Contoh HTML di JSFiddle . Menurut pendapat saya, dukungan komentar adalah hal yang paling kompleks dalam kode.

char*k[]={"auto","break","case","char","const","continue","default","do","double","else","enum","extern","float","for","goto","if","int","long","register","return","short","signed","sizeof","static","struct","switch","typedef","union","unsigned","void","volatile","while"},b[9];c;p;e;p;l=1;q;s;i;main(){printf("<style>tr:nth-child(2n){background:#C0C0C0}</style><table style=font-family:monospace;white-space:pre-wrap><tr><td>1<td>");while(~(c=getchar())){if(!e&&q){if(q==c){printf("%c</span>",c);q=0;continue;}}else if(isalpha(c)&&p<8){b[p++]=c;continue;}else if(b[0]){for(i=0;i<32;i++){s!=2&&!strcmp(k[i],b)&&(printf("<span style=color:#00F>%s</span>",b),b[0]=0);}printf("%s",b);memset(b,0,9);}p=0;switch(c){case'<':printf("&lt;");goto e;case'&':printf("&amp;");goto e;case 92:putchar(c);e^=1;goto e;case'/':q||1==s?putchar('/'):3==s?(printf("*/"),s=0):(s=1);break;case'*':if(s==1){s=2;printf("<span style=color:#FF0>/*");}else if(s==2){s=3;}else{goto d;}break;case 39:case'"':e||q||(q=c,printf("<span style=color:#0F0>"));e=0;goto d;case 10:l+=1;printf("<tr><td>%d<td>",l);if(p&&e){case'#':e=0;q||(p=1,printf("<span style=color:#F0F>"));}else{p=0;}default:d:10!=c&&putchar(c);e:s=s/2*2;}}puts(b);}

Dan versi yang lebih panjang (yang dapat dibaca sebagai program nyata, selain beberapa trik kode golf saya tidak berpikir saya bisa menerapkannya dengan mudah saat bermain golf program.

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
/* Sample comment. */
int main(void) {
    const char *keywords[] = {
        "auto",     "break",    "case",     "char",     "const",
        "continue", "default",  "do",       "double",   "else",
        "enum",     "extern",   "float",    "for",      "goto",
        "if",       "int",      "long",     "register", "return",
        "short",    "signed",   "sizeof",   "static",   "struct",
        "switch",   "typedef",  "union",    "unsigned", "void",
        "volatile", "while",
    };
    int character;
    int preprocessor = 0;
    int escape = 0;
    char buffer[9] = {0};
    int pos = 0;
    int line = 1;
    int quote = 0;
    int comment_state = 0;
    printf("<style>tr:nth-child(2n){background:#C0C0C0}</style><table style=font-family:monospace;white-space:pre-wrap><tr><td>1<td>");
    while ((character = getchar()) != EOF) {
        if (!escape && quote) {
            if (quote == character) {
                printf("%c</span>", character);
                quote = 0;
                continue;
            }
        }
        else if (isalpha(character) && pos < 8) {
            buffer[pos] = character;
            pos += 1;
            continue;
        }
        else if (buffer[0]) {
            int i;
            for (i = 0; i < ARRAY_SIZE(keywords); i++) {
                if (comment_state != 2 && strcmp(keywords[i], buffer) == 0) {
                    printf("<span style=color:#00F>%s</span>", buffer);
                    buffer[0] = 0;
                }
            }
            printf("%s", buffer);
            memset(buffer, 0, 9);
        }
        pos = 0;
        switch (character) {
        case '<':
            printf("&lt;");
            goto e;
        case '&':
            printf("&amp;");
            goto e;
        case '\\':
            putchar(character);
            escape ^= 1;
            goto e;
        case '/':
            if (quote || comment_state == 1) {
                putchar('/');
            }
            else if (comment_state == 3) {
                printf("*/");
                comment_state = 0;
            }
            else {
                comment_state = 1;
            }
            break;
        case '*':
            if (comment_state == 1) {
                comment_state = 2;
                printf("<span style=color:#FF0>/*");
            }
            else if (comment_state == 2) {
                comment_state = 3;
            }
            else {
                goto d;
            }
            break;
        case '\'':
        case '"':
            if (!escape && !quote) {
                quote = character;
                printf("<span style=color:#0F0>");
            }
            escape = 0;
            goto d;
        case '\n':
            line += 1;
            printf("<tr><td>%d<td>", line);

            /* Execute next only if conditions match. */
            if (preprocessor && escape) {
        case '#':
                escape = 0;
                if (!quote) {
                    preprocessor = 1;
                    printf("<span style=color:#F0F>");
                }
            }
            else {
                preprocessor = 0;
            }
            /* fallthru */
        default:
        d:
            if (character != '\n') putchar(character);
        e:
            comment_state = comment_state / 2 * 2;
        }
    }
    printf("%s</table>", buffer);
}
Konrad Borowski
sumber
Outputnya terlihat fantastis!
lochok
@lochok: Dan sekarang disingkat menjadi 1200 byte. Ini masih lebih lama dari solusi Perl, tetapi sekarang kesenjangannya lebih kecil.
Konrad Borowski
1
Kode Anda sepertinya tidak berfungsi dengan komentar multiline.
nickguletskii
3

PHP 606 byte × 0,9 × 0,8 = 436

<style>li:nth-child(odd){background:#f5f5f5}pre{tab-size:4;-moz-tab-size:4}</style><pre><ol><li><?php $p='preg_match';preg_match_all('_\w+|("|\')(\\\\?.)*?\1|#(.(?!/[/*]))*|//.*|(?s)/\*.*?\*/|.+?_',stream_get_contents(STDIN),$m);foreach($m[0]as$t)echo'</font><font color=#',$t[0]=='#'?'d0d':($p('_^/[/*]_',$t)?'bb0':($p('/"|\'/',$t)?'0d0':($p('/^(auto|break|(cas|continu|doubl|els|volatil|whil)e|char|(cons|defaul|floa|in|shor|struc)t|do|enum|extern|for|goto|if|long|register|return|sizeof|static|switch|typedef|union|(un|)signed|void)$/',$t)?'00f':0))),'>'.preg_replace("/\r?\n/",'<li>',htmlentities($t));

Diformat:

<style>li:nth-child(odd){background:#f5f5f5}pre{tab-size:4;-moz-tab-size:4}</style>
<pre><ol><li><?php
$p='preg_match';
preg_match_all('_\w+|("|\')(\\\\?.)*?\1|#(.(?!/[/*]))*|//.*|(?s)/\*.*?\*/|.+?_',
    stream_get_contents(STDIN),$m);
foreach($m[0]as$t)
    echo'</font><font color=#',
        $t[0]=='#'?'d0d':(
        $p('_^/[/*]_',$t)?'bb0':(
        $p('/"|\'/',$t)?'0d0':(
        $p('/^(auto|break|(cas|continu|doubl|els|volatil|whil)e|char|'.
        '(cons|defaul|floa|in|shor|struc)t|do|enum|extern|for|goto|if|long|register|'.
        'return|sizeof|static|switch|typedef|union|(un|)signed|void)$/',$t)?'00f':
        0))),
        '>'.preg_replace("/\r?\n/",'<li>',htmlentities($t));
  • Membaca dari stdin dan menulis ke stdout.

  • Akhir baris yang diterima adalah \ n dan \ r \ n.

  • Apakah nomor baris dan pergantian warna garis.

  • Saya menggunakan warna yang sedikit berbeda sehingga saya tahan melihatnya, meskipun tidak dengan cara yang mempengaruhi jumlah byte.

  • Saya tidak punya Chrome untuk mengujinya meskipun tidak masalah di Firefox.

Boann
sumber
2

C ++ - 5067 bytes 4612 * 0,9 * 0,8 = 3320 (* 0,9 = 2988 jika mampu memformat sendiri diperhitungkan - ditulis dalam C ++)

Saya menyadari bahwa ini lebih besar daripada solusi yang telah disajikan di sini, tetapi saya tetap memutuskan untuk memposting ini karena saya mulai mengerjakan versi saya sebelum solusi C oleh xfix diposting.

  • Ini bekerja dengan komentar multiline
  • Ini menghasilkan HTML yang memiliki kesalahan (tetapi ditampilkan dengan benar di Chrome)
  • Bunyinya keluar dari input.c dan menghasilkan output.html

Setengah dari ini adalah array besar kata kunci C dan C ++.

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <locale>
#define B break
#define Z(a,b)if(s[i]==a){z=b;o+=OP+p(s[i]);i++;B;}
#define V(a,b)if(e(s,i,a)){z=b;o+=OC+p(s.substr(i,2));i+=2;B;}
#define N(a) if(s[i]=='\n'){a;o+=nl();i++;B;}
#define Q(a)if(e(s,i,"\\")){o+=p(s.substr(i,2));i+=2;B;}if(s[i]==a){z=0;o+=p(s[i])+CL;i++;B;}
#define C case
using namespace std;string k[]={"__abstract","__alignof","_Alignas","_alignof","and","and_eq","__asm","_asm","asm","__assume","_assume","auto","__based","_based","bitand","bitor","bool","_Bool","__box","break","__builtin_alignof","_builtin_alignof","__builtin_isfloat","case","catch","__cdecl","_cdecl","_Complex","cdecl","char","class","__compileBreak","_compileBreak","compl","const","const_cast","continue","__declspec","_declspec","default","__delegate","delete","do","double","dynamic_cast","else","enum","__event","__except","_except","explicit","__export","_export","extern","false","__far","_far","far","__far16","_far16","__fastcall","_fastcall","__feacpBreak","_feacpBreak","__finally","_finally","float","for","__forceinline","_forceinline","__fortran","_fortran","fortran","friend","_Generic","__gc","goto","__hook","__huge","_huge","huge","_Imaginary","__identifier","if","__if_exists","__if_not_exists","__inline","_inline","inline","int","__int128","__int16","_int16","__int32","_int32","__int64","_int64","__int8","_int8","__interface","__leave","_leave","long","__multiple_inheritance","_multiple_inheritance","mutable","namespace","__near","_near","near","new","_Noreturn","__nodefault","__nogc","__nontemporal","not","not_eq","__nounwind","__novtordisp","_novtordisp","operator","or","or_eq","__pascal","_pascal","pascal","__pin","__pragma","_pragma","private","__probability","__property","protected","__ptr32","_ptr32","__ptr64","_ptr64","public","__raise","register","reinterpret_cast","restrict","__restrict","__resume","return","__sealed","__serializable","_serializable","short","signed","__single_inheritance","_single_inheritance","sizeof","static","static_cast","_Static_assert","__stdcall","_stdcall","struct","__super","switch","__sysapi","__syscall","_syscall","template","this","__thiscall","_thiscall","throw","_Thread_local","__transient","_transient","true","__try","_try","try","__try_cast","typedef","typeid","typename","__typeof","__unaligned","__unhook","union","unsigned","using","__uuidof","_uuidof","__value","virtual","__virtual_inheritance","_virtual_inheritance","void","volatile","__w64","_w64","__wchar_t","wchar_t","while","xor","xor_eq"};string OS="<font color=\"#00FF00\">";string OC="<font color=\"#FFFF00\">";string OK="<font color=\"#0000FF\">";string OP="<font color=\"#FF00FF\">";string CL="</font>";string NL[]={ "<li class=\"li l1\">","<li class=\"li l2\">" };bool lo=1;string nl() {lo=!lo;return "</li>"+NL[lo];}bool r(char c,string s) {for (size_t i=0; i<s.size(); i++)if (c==s[i])return 0;return 0;}bool is(string s,int i) {return !(i<0||i>=s.size())&&((s[i]=='_')||isalpha(s[i]));}bool ic(string s,int i){return !(i<0||i>=s.size())&&(is(s,i)||('0'<=s[i]&&s[i]<='9'));}bool e(string a,int s,string b) {return !(a.size()-s<b.size())&&a.substr(s,b.size())==b;}string p(char c) {switch (c) {C  '&':return "&amp;";C  '\"':return "&quot;";C  '\'':return "&apos;";C  '<':return "&lt;";C  '>':return "&gt;";}stringstream s;s<<c;return s.str();}string p(string s) {string ans="";for (size_t i=0; i<s.size(); i++) {ans+=p(s[i]);}return ans;}string h(string s) {int z=0;size_t i=0;string o ="<html><body><style type=\"text/css\">.l{list-style-type: decimal;margin-top:0;margin-bottom:0;} .li{display:list-item;word-wrap:B-word;} .l1{background-color:#FFFFFF;} .l2{background-color:#EEEEEE;} .cd{white-space:pre;}</style><code class=\"cd\"><ul class=\"l\">";o+=NL[1];for (; i<s.size();) {switch (z) {C 0:{Z('#',6)Z('"',3)Z('\'',2)V("//",4)V("/*",5)N()for (size_t j=0; j<201; j++)if (e(s,i,k[j])&&!ic(s,i+k[j].size())) {o+=OK+p(k[j])+CL;i+=k[j].size();B;}if (is(s,i)) {z=7;o+=p(s[i]);i++;if (i+1==s.size()||!ic(s,i+1)) {z=0;}B;}o+=p(s[i]);i++;B;}o+=p(s[i]);i++;B;C  2:Q('\'')C  3:Q('"')C 4:{N(z=0)o+=p(s[i]); i++;B;}C 5:{if (e(s,i,"*/")) {z=0;o+=p(s.substr(i,2))+CL;i+=2;B;}N()o+=p(s[i]);i++;B;}C 6:{if (s[i]=='\n') {int j=i-1;for (; j>=0&&r(s[j],"\n\t "); j--);if (j<0||s[j] != '\\') {z=0;o+=CL+nl();i++;B;}o+=nl();i++;B;}o+=p(s[i]);i++;B;}C 7:{if (i+1==s.size()||!ic(s,i+1)) {z=0;}o+=p(s[i]);i++;B;}}}o+="</ul></code>";return o;}int main() {ifstream i("input.c");ofstream o("output.html");string cCode((istreambuf_iterator<char>(i)),istreambuf_iterator<char>());o<<h(cCode)<<endl;}

Versi yang dapat dibaca:

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
using namespace std;

//The 201 keywords from C and C++. Not sure if all of them are listed here!
const size_t NUMBER_OF_KEYWORDS = 201;
string keywords[] = { "__abstract", "__alignof", "_Alignas", "_alignof", "and",
        "and_eq", "__asm", "_asm", "asm", "__assume", "_assume", "auto",
        "__based", "_based", "bitand", "bitor", "bool", "_Bool", "__box",
        "break", "__builtin_alignof", "_builtin_alignof", "__builtin_isfloat",
        "case", "catch", "__cdecl", "_cdecl", "_Complex", "cdecl", "char",
        "class", "__compileBreak", "_compileBreak", "compl", "const",
        "const_cast", "continue", "__declspec", "_declspec", "default",
        "__delegate", "delete", "do", "double", "dynamic_cast", "else", "enum",
        "__event", "__except", "_except", "explicit", "__export", "_export",
        "extern", "false", "__far", "_far", "far", "__far16", "_far16",
        "__fastcall", "_fastcall", "__feacpBreak", "_feacpBreak", "__finally",
        "_finally", "float", "for", "__forceinline", "_forceinline",
        "__fortran", "_fortran", "fortran", "friend", "_Generic", "__gc",
        "goto", "__hook", "__huge", "_huge", "huge", "_Imaginary",
        "__identifier", "if", "__if_exists", "__if_not_exists", "__inline",
        "_inline", "inline", "int", "__int128", "__int16", "_int16", "__int32",
        "_int32", "__int64", "_int64", "__int8", "_int8", "__interface",
        "__leave", "_leave", "long", "__multiple_inheritance",
        "_multiple_inheritance", "mutable", "namespace", "__near", "_near",
        "near", "new", "_Noreturn", "__nodefault", "__nogc", "__nontemporal",
        "not", "not_eq", "__nounwind", "__novtordisp", "_novtordisp",
        "operator", "or", "or_eq", "__pascal", "_pascal", "pascal", "__pin",
        "__pragma", "_pragma", "private", "__probability", "__property",
        "protected", "__ptr32", "_ptr32", "__ptr64", "_ptr64", "public",
        "__raise", "register", "reinterpret_cast", "restrict", "__restrict",
        "__resume", "return", "__sealed", "__serializable", "_serializable",
        "short", "signed", "__single_inheritance", "_single_inheritance",
        "sizeof", "static", "static_cast", "_Static_assert", "__stdcall",
        "_stdcall", "struct", "__super", "switch", "__sysapi", "__syscall",
        "_syscall", "template", "this", "__thiscall", "_thiscall", "throw",
        "_Thread_local", "__transient", "_transient", "true", "__try", "_try",
        "try", "__try_cast", "typedef", "typeid", "typename", "__typeof",
        "__unaligned", "__unhook", "union", "unsigned", "using", "__uuidof",
        "_uuidof", "__value", "virtual", "__virtual_inheritance",
        "_virtual_inheritance", "void", "volatile", "__w64", "_w64",
        "__wchar_t", "wchar_t", "while", "xor", "xor_eq" };

// Different states
const int NONE = 0;
const int WHITESPACE = 1;
const int CHAR_UNCLOSED = 2;
const int STRING_UNCLOSED = 3;
const int LINE_COMMENT_UNCLOSED = 4;
const int MULTILINE_COMMENT_UNCLOSED = 5;
const int PREPROCESSOR_UNCLOSED = 6;
const int IDENTIFIER = 7;

//Different elements
const string OPEN_STRING = "<font color=\"#00FF00\">";
const string CLOSE_STRING = "</font>";
const string OPEN_COMMENT = "<font color=\"#FFFF00\">";
const string CLOSE_COMMENT = "</font>";
const string OPEN_KEYWORD = "<font color=\"#0000FF\">";
const string CLOSE_KEYWORD = "</font>";
const string OPEN_PREPROCESSOR = "<font color=\"#FF00FF\">";
const string CLOSE_PREPROCESSOR = "</font>";

//Alternating background
const string NEW_LINE[] = { "<li class=\"li l1\">", "<li class=\"li l2\">" };
bool lineOdd = true;
string getNewLineHTML() {
    lineOdd = !lineOdd;
    return "</li>" + NEW_LINE[lineOdd];
}

//Check if the character is in the string chars
bool inRange(char c, string chars) {
    for (size_t i = 0; i < chars.size(); i++)
        if (c == chars[i])
            return true;
    return false;
}

//Check if the character is the start of an identifier
bool isIdentifierStart(string input, int i) {
    if (i < 0 || i >= input.size())
        return false;
    return (input[i] == '_') || ('a' <= input[i] && input[i] <= 'z')
            || ('A' <= input[i] && input[i] <= 'Z');
}
//Check if the character is the continuation of an identifier
bool isIdentifierCont(string input, int i) {
    if (i < 0 || i >= input.size())
        return false;
    return ('0' <= input[i] && input[i] <= '9') || isIdentifierStart(input, i);
}
//Check if a[start + i] == b[i], i<b.size()
bool eqRange(string a, int start, string b) {
    if (a.size() - start < b.size())
        return false;
    return a.substr(start, b.size()) == b;
}
//Escape the sourcecode for HTML
string escape(char c) {
    switch (c) {
    case '&':
        return "&amp;";
    case '\"':
        return "&quot;";
    case '\'':
        return "&apos;";
    case '<':
        return "&lt;";
    case '>':
        return "&gt;";
    }
    //Is there a better way to do this?
    stringstream strm;
    strm << c;
    return strm.str();
}
string escape(string str) {
    string ans = "";
    for (size_t i = 0; i < str.size(); i++) {
        ans += escape(str[i]);
    }
    return ans;
}
string highlight(string input) {
    //The current state
    int state = NONE;
    //The current position
    size_t i = 0;
    //Styles
    string output =
            "<html><body><style type=\"text/css\">.l{list-style-type: decimal; margin-top: 0; margin-bottom: 0;} .li{ display: list-item; word-wrap: break-word;} .l1{background-color: #FFFFFF;} .l2{background-color: #EEEEEE;} .cd{white-space: pre;}</style><code class=\"cd\"><ul class=\"l\">";
    output += NEW_LINE[1];
    for (; i < input.size();) {
        switch (state) {
        case NONE: {
            if (input[i] == '#') { //Start a preprocessor statement
                state = PREPROCESSOR_UNCLOSED;
                output += OPEN_PREPROCESSOR + escape(input[i]);
                i++;
                break;
            }
            if (input[i] == '"') { //Start a string
                state = STRING_UNCLOSED;
                output += OPEN_STRING + escape(input[i]);
                i++;
                break;
            }
            if (input[i] == '\'') { //Start a character
                state = CHAR_UNCLOSED;
                output += OPEN_STRING + escape(input[i]);
                i++;
                break;
            }
            if (eqRange(input, i, "//")) { //Start a single line comment
                state = LINE_COMMENT_UNCLOSED;
                output += OPEN_COMMENT + escape(input.substr(i, 2));
                i += 2;
                break;
            }
            if (eqRange(input, i, "/*")) { //Start a multi-line comment
                state = MULTILINE_COMMENT_UNCLOSED;
                output += OPEN_COMMENT + escape(input.substr(i, 2));
                i += 2;
                break;
            }
            if (input[i] == '\n') { //New lines are special!
                output += getNewLineHTML();
                i++;
                break;
            }
            for (size_t j = 0; j < NUMBER_OF_KEYWORDS; j++) //Iterate through keywords
                if (eqRange(input, i, keywords[j])
                        && !isIdentifierCont(input, i + keywords[j].size())) { // The keyword can't be a prefix of an identifier, so test for that
                    output += OPEN_KEYWORD + escape(keywords[j])
                            + CLOSE_KEYWORD;
                    i += keywords[j].size();
                    break;
                }
            //Treat identifiers separately because we need to separate identifiers from keywords.
            if (isIdentifierStart(input, i)) {
                state = IDENTIFIER;
                output += escape(input[i]);
                i++;
                //If the next character is not a part of the identifier, go to the NONE state
                if (i + 1 == input.size() || !isIdentifierCont(input, i + 1)) {
                    state = NONE;
                }
                break;
            }
            //Other characters
            output += escape(input[i]);
            i++;
            break;
        }
        case CHAR_UNCLOSED: {
            if (eqRange(input, i, "\\")) { //Treat escape sequences inside quotes
                output += escape(input.substr(i, 2));
                i += 2;
                break;
            }
            if (input[i] == '\'') { //Close quote
                state = NONE;
                output += escape(input[i]) + CLOSE_STRING;
                i++;
                break;
            }
            output += escape(input[i]); //Other characters go into the literal
            i++;
            break;
        }
        case STRING_UNCLOSED: {
            if (eqRange(input, i, "\\")) { //Treat escape sequences inside quotes
                output += escape(input.substr(i, 2));
                i += 2;
                break;
            }
            if (input[i] == '"') { //Close quote
                state = NONE;
                output += escape(input[i]) + CLOSE_STRING;
                i++;
                break;
            }
            output += escape(input[i]); //Other characters go into the literal
            i++;
            break;
        }
        case LINE_COMMENT_UNCLOSED: {
            if (input[i] == '\n') { //Close comment with new line
                state = NONE;
                output += CLOSE_COMMENT + getNewLineHTML();
                i++;
                break;
            }
            output += escape(input[i]); //Comment body
            i++;
            break;
        }
        case MULTILINE_COMMENT_UNCLOSED: {
            if (eqRange(input, i, "*/")) { //Close multiline comment
                state = NONE;
                output += escape(input.substr(i, 2)) + CLOSE_COMMENT;
                i += 2;
                break;
            }
            if (input[i] == '\n') { //New lines are special!
                output += getNewLineHTML();
                i++;
                break;
            }
            output += escape(input[i]); //Comment body
            i++;
            break;
        }
        case PREPROCESSOR_UNCLOSED: {
            if (input[i] == '\n') { //Close preprocessor statement or go to next line
                int j = i - 1;
                for (; j >= 0 && inRange(input[j], "\n\t "); j--)
                    //Seek las non-whitespace character
                    ;
                if (j < 0 || input[j] != '\\') { //Check if the last non-whitespace character is a backslash
                    state = NONE; //... If it isn't, close the preprocessor statement
                    output += CLOSE_PREPROCESSOR + getNewLineHTML();
                    i++;
                    break;
                }
                output += getNewLineHTML(); //... If it is, we need to extend the preprocessor statement to the next line
                i++;
                break;
            }
            output += escape(input[i]);
            i++;
            break;
        }
        case IDENTIFIER: {
            //If the next character is not a part of the identifier, go to the NONE state
            if (i + 1 == input.size() || !isIdentifierCont(input, i + 1)) {
                state = NONE;
            }
            output += escape(input[i]);
            i++;
            break;
        }
        }
    }
    output += "</ul></code>";
    return output;
}
int main() {
    ifstream input("input.c");
    ofstream output("output.html");
    std::string cCode((std::istreambuf_iterator<char>(input)),
            std::istreambuf_iterator<char>());
    output << highlight(cCode) << endl;
}
nickguletskii
sumber
Apakah benar-benar perlu untuk menyoroti kata kunci kompiler. Mereka tidak standar. Tetapi jika Anda benar-benar ingin, anggap apa pun yang dimulai dengan __(dua garis bawah) sebagai kata kunci, seperti spesifikasi mengatakan itu dicadangkan untuk tujuan implementasi.
Konrad Borowski