<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Forte CynCity &#187; Mike Meredith</title>
	<atom:link href="http://cyncity.forteds.com/author/mikemeredith/feed/" rel="self" type="application/rss+xml" />
	<link>http://cyncity.forteds.com</link>
	<description></description>
	<lastBuildDate>Mon, 28 Mar 2011 15:38:43 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Modular Interfaces Part III: Custom Interfaces</title>
		<link>http://cyncity.forteds.com/2011/03/28/modular-interfaces-part-iii-custom-interfaces/</link>
		<comments>http://cyncity.forteds.com/2011/03/28/modular-interfaces-part-iii-custom-interfaces/#comments</comments>
		<pubDate>Mon, 28 Mar 2011 15:38:43 +0000</pubDate>
		<dc:creator>Mike Meredith</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Methodology]]></category>

		<guid isPermaLink="false">http://cyncity.forteds.com/?p=160</guid>
		<description><![CDATA[Continuing with our series on modular interfaces, today we&#8217;re going to talk about one of the most powerful features in Cynthesizer&#8211;the ability to create custom interfaces and use them at a high level.  These are more than just ready/valid handshakes or external memory interfaces like I showed you last time.  These are very [...]]]></description>
			<content:encoded><![CDATA[<p>Continuing with our series on modular interfaces, today we&#8217;re going to talk about one of the most powerful features in Cynthesizer&#8211;the ability to create custom interfaces and use them at a high level.  These are more than just ready/valid handshakes or external memory interfaces like I showed you last time.  These are very complex streaming and buffer-style interfaces that transfer entire data structures.</p>
<p>Anyone who has designed a complex interface in RTL knows that one of the biggest chores is keeping track of the details, and it’s difficult to reuse the work you did before if any of the interface details change in any way.   If the data is a multidimensional array, the interface needs to know things like the row and column parameters so the array is formatted properly.  If the data is moving from one thread or module to another, the interface needs to understand how to synchronize between them.  And most interfaces require some kind of internal memory to temporarily store data in case the reads and writes happen at different speeds.</p>
<p>All in all, when you design interfaces in RTL you spend all your time and many lines of code getting all these details resolved.  There’s no way to just say “take the data from here and put it over there.”</p>
<p>But in Cynthesizer there is, and it does it in a way that drastically reduces the amount of code you have to write and the time you spend verifying it. The keys to this are interface generation and packaging the details up so that you can write your algorithm at the transaction level. Keep reading and I’ll show you some hard data on that.</p>
<p><span style="color: #0000ff;"><strong>Interfaces &#8216;R Us</strong></span><br />
So I want to create an interface, a real complicated one.  What do I do?</p>
<p>Well, in Cynthesizer there’s a tool called the Interface Generator.  The Interface Generator is a combination of really sophisticated IP and a graphical editing window where you can create any custom, fully parameterized interface.  This interface then becomes a packaged component (actually a set of C++ classes) I can use in a high-level SystemC design.  Here’s a rundown of familiar interface types you can create in the Interface Generator:</p>
<ul>
<li>Buffer
<ul>
<li>Transfers data between two modules or threads through a shared buffer.</li>
</ul>
</li>
</ul>
<ul>
<li> Line Buffer
<ul>
<li>Transfers an array and stores multiple rows of data for reading.</li>
</ul>
</li>
</ul>
<ul>
<li> Circular Buffer
<ul>
<li>Transfers data over a circular buffer where the reading and writing operations are tightly synchronized.</li>
</ul>
</li>
</ul>
<ul>
<li> Streaming
<ul>
<li>Transfers an array of streaming data over one or more clock cycles.</li>
</ul>
</li>
</ul>
<ul>
<li> Trigger/Done
<ul>
<li>A master/slave architecture with acknowledgement signals at the beginning and end of data transfer.</li>
</ul>
</li>
</ul>
<ul>
<li> P2P Stream
<ul>
<li>A general form of Forte’s CynWare point-to-point interface with features for creating specialized fifos.</li>
</ul>
</li>
</ul>
<p>For each type of interface, the Interface Editor window in Interface Generator shows you a diagram of the read/write structure, a diagram of the access pattern for the datatype being transferred, and many related parameters that you can define for your needs.  Best of all, the interface you create will have transaction-level functions like <em>get()</em>, <em>put()</em>, <em>x_done()</em>, <em>next_y()</em>, etc. that let your algorithm execute entire interface accesses with a single function call.</p>
<p><strong><span style="color: #0000ff;">A Real Example</span></strong><br />
Let’s take a look at creating an interface part and using it in a SystemC algorithm.  I will describe what I want the interface to do in <em>words </em>because it would be too daunting to describe in RTL.</p>
<p>I want an interface that does the following:</p>
<ul>
<li>Allows a reader and a writer to share an array of 16-bit unsigned integers</li>
</ul>
<ul>
<li>Allows the writer to write to the array in groups of values, working from the beginning of the array to the end</li>
</ul>
<ul>
<li> Allows the reader to read the array in groups of values working from the beginning of the array to the end</li>
</ul>
<ul>
<li> Coordinates the activities of the reader and writer so that there is no need to store the whole array in memory or registers</li>
</ul>
<p>Okay, that’s the basic function and it seems easy enough.  But for something like this to work in the real world you need to cover that stuff I mentioned before—the details.  And the details are things like this:</p>
<ul>
<li> Implement the internal buffer to be 1024 words long</li>
</ul>
<ul>
<li> The writer puts the first group of input values (let’s say 2 at a time)  in the first two words of the buffer (words 0 and 1), put the next group of inputs in the next two words of the buffer (words 2 and 3), etc.</li>
</ul>
<ul>
<li> The reader gets the first group of output values (let’s say 8 at a time) from the first eight words of the buffer (words 0-7), get the next outputs by shifting once and reading the next eight words (words 1-8), etc.</li>
</ul>
<ul>
<li> When the writer has put inputs in the last two words of the buffer, circle around and put the next inputs in the first two words of the buffer</li>
</ul>
<ul>
<li> When the reader has grabbed outputs that go beyond the edge of the buffer, circle around and get the remaining values from the beginning of the buffer</li>
</ul>
<ul>
<li> Maintain input and output buffer pointers to make sure the correct buffer contents are read or written</li>
</ul>
<ul>
<li> Synchronize the interface at the beginning and the end of the algorithm’s execution</li>
</ul>
<ul>
<li> Synchronize the interface at the beginning and the end of any iterating loop in the algorithm</li>
</ul>
<ul>
<li> Fully handshake all interface accesses</li>
</ul>
<p>Suddenly this is not looking so easy.  If I was designing the old way I would have to create a buffer array, set up a bunch of pointer variables, make complicated address assignments to keep everything pointing to the right place, keep track of where I was in the memory to handle that circular wraparound requirement, and harness all reads and writes with some kind of ready/valid handshake.  Sigh.</p>
<p>But now I’m using Cynthesizer.  Here’s what the Interface Editor window looks like for an interface that does exactly what I need:</p>
<p><a href="http://cyncity.forteds.com/wp-content/uploads/2011/03/ifed.jpg"><img class="alignnone size-thumbnail wp-image-208" title="Interface Editor Window" src="http://cyncity.forteds.com/wp-content/uploads/2011/03/ifed-150x150.jpg" alt="" width="75" height="75" /></a></p>
<address>[Click to enlarge]</address>
<p>Note first I have chosen to create a circular buffer interface because of the wraparound requirement.  In the “Reader” and “Writer” sections I can choose the size of the working set on each side of the interface.  I need an array of two input values on the writer side and an array of eight output values on the reader side, so you see<em> 2</em> and <em>8 </em>in those boxes.  Also, the output pointer needs to shift by one when read, so I entered <em>1</em> in the “Adjustment” box.  Now all that’s left is to define what is transferred over this interface.  In the “Parameter” section I’ve specified a <em>sc_uint</em> datatype, which is the SystemC standard for an unsigned integer, and given it a width of <em>16</em> in the “#Bits” box.  Then I sized a “1D”, 1024-word  buffer for the needed internal storage.  Finally, I specify <em>RAM2 </em>as the memory part I want the interface to use to implement the buffer.  I created RAM2 myself previously using Cynthesizer’s Memory Editor window.  Please read <a href="http://cyncity.forteds.com/2011/01/27/modint-part-ii-extmem/" target="_self">Part II of this blog series</a> for information on creating memories.  I saved my interface as a part named <em>my_if</em>.<br />
So “my_if” now exists as an interface part in my library.  Let’s put it to good use in an algorithm that does something like this:</p>
<p><a href="http://cyncity.forteds.com/wp-content/uploads/2011/03/blog3.jpg"><img class="alignnone size-full wp-image-205" title="High-Level Algorithm" src="http://cyncity.forteds.com/wp-content/uploads/2011/03/blog3.jpg" alt="" width="440" height="358" /></a></p>
<address> </address>
<p>My module DUT needs to do the following:</p>
<ul>
<li> Read an input port din, calculate two values and put them in the interface.  I’ll create a thread called <em>writer()</em> to do this.</li>
<li> Get eight values from the interface, make a calculation with them, and write the result of that calculation to an output port dout.  I’ll create a thread called <em>reader() </em>to do this.</li>
</ul>
<p>First, I have to define a DUT module, declare the ports and threads and instantiate my interface:</p>
<pre>SC_MODULE( dut ) {
    cynw_p2p&lt; sc_uint&lt;16&gt; &gt;::in		din;
    cynw_p2p&lt; sc_uint&lt;16&gt; &gt;::out	dout;
    <span style="color: #ff0000;">my_if::direct&lt;&gt;	 		m_if;</span>
    …
    void writer();
    void reader();
    …
    SC_THREAD( writer, clk.pos() );
    SC_THREAD( reader, clk.pos() );
}</pre>
<p>In red I show an instance of my_if using the <em>::direct&lt;&gt;</em> template class.  This is one of the member classes in my_if, and it is used to represent an internal interface communicating directly between two threads.  Conversely, there is a <em>::chan&lt;&gt;</em> class to use when the interface is external and communicates between two modules.</p>
<p>Now in the DUT module we’ll define the writer() and reader() threads to interact with the interface.</p>
<pre>void writer()
{
    …
    <span style="color: #ff0000;">m_if.w_start_tx();</span>
    // 512 gets (working set of 2; fills 1024 length buffer)
    for( int i=0; i &lt; (1024/2); i++ ) {
        <span style="color: #ff0000;">m_if.w_start_iter();</span>
        vin = din.get();
        m_if[i*2] = vin;
        m_if[i*2+1] = vin+1;
        <span style="color: #ff0000;">m_if.w_end_iter();</span>
    }
   <span style="color: #ff0000;"> m_if.w_end_tx();</span>
}</pre>
<pre>void reader()
{
    …
    <span style="color: #ff0000;">m_if.r_start_tx();</span>
    // 1016 puts (working set of 8 with read adjustment of 1)
    for( int i = 0; i &lt; (1024-8)+1; i++ )
        <span style="color: #ff0000;">m_if.r_start_iter();</span>
        v1 = m_if[j];
        v2 = m_if[j*8-1];
        val = (v1 + v2)/2;
        dout.put( val );
        <span style="color: #ff0000;">m_if.r_end_iter();</span>
    }
    <span style="color: #ff0000;">m_if.r_end_tx();</span>
}</pre>
<p>That&#8217;s it.  I&#8217;m done.</p>
<p>In red I have highlighted some more of the transaction-level member functions that save me a lot of time and keep my algorithm at a high level.  They are:</p>
<ul>
<li> <em>w_start_tx(), w_end_tx() </em>
<ul>
<li>Synchronizes the algorithm on the writing side of the interface</li>
</ul>
</li>
</ul>
<ul>
<li> <em>r_start_tx(), r_end_tx() </em>
<ul>
<li>Synchronizes the algorithm on the reading side of the interface</li>
</ul>
</li>
</ul>
<ul>
<li> <em>w_start_iter(), w_end_iter() </em>
<ul>
<li>Synchronizes a loop iteration on the writing side of the interface</li>
</ul>
</li>
</ul>
<ul>
<li> <em>r_start_iter(), r_end_iter() </em>
<ul>
<li>Synchronizes a loop iteration on the reading side of the interface</li>
</ul>
</li>
</ul>
<p style="padding-left: 30px;">
<p>So with the Interface Generator I defined a few parameters and instantly had a SystemC interface part loaded with powerful classes and functions, and using them I was able to describe the algorithm at the transaction level in just <strong>58</strong> lines.  This same design in RTL would have taken well over <strong>4600</strong> lines!</p>
<p style="padding-left: 30px;">
<p>Next time, we&#8217;ll conclude our series on modular interfaces with some final suggestions of things you should consider.</p>
]]></content:encoded>
			<wfw:commentRss>http://cyncity.forteds.com/2011/03/28/modular-interfaces-part-iii-custom-interfaces/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Modular Interfaces Part II: External Memories</title>
		<link>http://cyncity.forteds.com/2011/01/27/modint-part-ii-extmem/</link>
		<comments>http://cyncity.forteds.com/2011/01/27/modint-part-ii-extmem/#comments</comments>
		<pubDate>Thu, 27 Jan 2011 19:09:15 +0000</pubDate>
		<dc:creator>Mike Meredith</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://cyncity.forteds.com/?p=118</guid>
		<description><![CDATA[In the opening entry of this series, we introduced modular interfaces and talked about their importance in high-level synthesis (HLS).  We touched on the ways that other HLS tools force you to create interfaces&#8211;with limiting approaches like ANSI C or off-the-shelf IP&#8211;and contrasted that against the strengths of our Cynthesizer tool.  By designing [...]]]></description>
			<content:encoded><![CDATA[<p>In the opening entry of this series, we introduced modular interfaces and talked about their importance in high-level synthesis (HLS).  We touched on the ways that other HLS tools force you to create interfaces&#8211;with limiting approaches like ANSI C or off-the-shelf IP&#8211;and contrasted that against the strengths of our Cynthesizer tool.  By designing with Cynthesizer using SystemC, we showed examples of how the verification, standardization and customization of interfaces is all made easier.</p>
<p>The second part of our series on modular interfaces deals with external memories.  It&#8217;s unavoidable when you’re designing at an abstract level&#8211;at some point you&#8217;re going to need a memory.  High level algorithms, by nature, make extensive use of loops that operate on arrayed datatypes.  Not all HLS tools handle external memories well.  With some you have to schedule the memory interface by hand!  There are several different ways that Cynthesizer can implement those arrays for you, but it is when they become external memories that Cynthesizer’s modularity really shines.</p>
<p><strong><span style="color: #0000ff;">Arrays In HLS</span></strong><br />
Let’s say you have an array in your design, maybe something like this:</p>
<pre>sc_uint&lt;16&gt; mem[16];
…
for ( int i = 0; i &lt; 16; i++)
{
    acc = acc + coeff * mem[i];
}</pre>
<p>What can you do with it?  Well, in Cynthesizer there are three things you can do.  You can:</p>
<ul>
<li><em>Flatten it</em>
<ul>
<li>Using a directive or command-line switch, Cynthesizer will flatten the array into individual registers.  This is helpful if you want a short latency—your architecture will have access to all array locations in the fewest possible cycles.</li>
</ul>
</li>
<li><em>Make it an <strong>internal </strong>memory</em>
<ul>
<li>Using Cynthesizer’s Memory Model Editor, you can create a memory model that Cynthesizer will allocate to store your array.  This is good if you have a few extra cycles to spare and don’t want the added muxing of a flattened architecture.</li>
</ul>
</li>
<li><em>Make it an <strong>external </strong>memory</em>
<ul>
<li>Using the same memory model, you can instruct Cynthesizer to implement the memory externally.  Cynthesizer will synthesize an interface to this memory for you.  You will not have to worry about controlling the address, data or enable ports.</li>
</ul>
</li>
</ul>
<p>It is the ease of making a memory external (and the lack of changes required to your SystemC source code) that we will get into now.</p>
<p><strong><span style="color: #0000ff;">Going External</span></strong><br />
So you want your array to be an external memory?  First thing you need to do is build it.  To do that you can invoke the Memory Model Editor in the Cynthesizer Workbench GUI.  Here’s what it looks like:</p>
<p><a href="http://cyncity.forteds.com/wp-content/uploads/2011/01/mem_edit.jpg"><img class="size-thumbnail wp-image-132 alignnone" title="Cynthesizer Memory Model Editor" src="http://cyncity.forteds.com/wp-content/uploads/2011/01/mem_edit-150x150.jpg" alt="" width="60" height="60" /></a></p>
<p><em>[Click to enlarge]</em></p>
<p>There’s actually not that much you have to do here.  We set “Word Size:” and “Number of Words:” both to 16 to match our array&#8217;s size and indices.  The “Latency:” is 1 by default, and the “Setup time:” and “Output Delay:” should be a nonzero time value based on the data from your technology library. And since this memory will be external, the “Area:” is 0.</p>
<p>By default this will be a single-port memory.  If you want something different you can click the “Ports” tab, where you can increase the number of ports, edit the names of the address and data lines, specify any enable lines or masking, associate each port with a clock or reset, and configure the ports as read-only, write-only or both.</p>
<p>But the real power for us is in the “Internal memories” and “External memories” boxes where you can specify chaining or registering of the memory I/O.  That’s right, the SystemC memory model we create will be usable as <em>either </em>an internal or external memory.</p>
<p>So once we’ve generated the memory model (I called it “mem_part”), how do we use it?  All you have to do is turn your array declaration into a memory port instantiation.  Previously we had this:</p>
<pre>sc_uint&lt;16&gt; mem[16];</pre>
<p>now we have this:</p>
<pre>mem_part::port&lt;ioConfig, sc_uint&lt;16&gt; &gt;  mem;</pre>
<p><em>port&lt;&gt;</em> is a templated class in the generated SystemC memory model that represents an external port connection.  Its arguments are the I/O configuration, which can be defined for TLM or pin-level simulation, and its datatype.  The memory model will also have to be connected to clock and reset but there are high-level functions available to do that.  Why, you ask, do we have to do all this?  Chances are you chose an external memory because it needs to be shared with other modules.  The classes like port&lt;&gt; make it possible to accurately simulate intermodule communication through a shared memory at the <em>behavioral </em>level.  With other tools that use ANSI C or don&#8217;t have customizable interfaces, you have to wait till you have RTL to verify anything like this.</p>
<p>What do you have to do to the source code?  Well, nothing:</p>
<pre>for ( int i = 0; i &lt; 16; i++)
{
    acc = acc + coeff * mem[i];
}</pre>
<p>The array access looks exactly the same as before. Cynthesizer knows to associate your array access with an external memory port, and it schedules the interface automatically based on your latency constraints.  The external memory has become a truly modular interface, and it was all created for you.  If you generate RTL for this design you will see the synthesized interface in the port list:</p>
<pre>module dut(clock, RSTN, inp_busy, inp_vld, inp_data, outp_busy,
           outp_vld, outp_data, <span style="color: #ff0000;">mem_WE0, mem_DINw, mem_DOUTw,
           mem_Aw, mem_REQ0</span>);
  input clock;
  input RSTN;
  …
  input [15:0] <span style="color: #ff0000;">mem_DOUTw</span>;
  output <span style="color: #ff0000;">mem_WE0</span>;
  reg <span style="color: #ff0000;">mem_WE0</span>;
  output [15:0] <span style="color: #ff0000;">mem_DINw</span>;
  output [3:0] <span style="color: #ff0000;">mem_Aw</span>;
  reg [3:0] <span style="color: #ff0000;">mem_Aw</span>;
  output <span style="color: #ff0000;">mem_REQ0</span>;
  …</pre>
<p>Next time, we&#8217;ll take a look at the techniques used to build custom modular interfaces using Cynthesizer&#8217;s Interface Editor.</p>
]]></content:encoded>
			<wfw:commentRss>http://cyncity.forteds.com/2011/01/27/modint-part-ii-extmem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Modular Interfaces  Part I: Benefits</title>
		<link>http://cyncity.forteds.com/2011/01/06/modular-interfaces-part-i-benefits/</link>
		<comments>http://cyncity.forteds.com/2011/01/06/modular-interfaces-part-i-benefits/#comments</comments>
		<pubDate>Fri, 07 Jan 2011 00:15:59 +0000</pubDate>
		<dc:creator>Mike Meredith</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Methodology]]></category>

		<guid isPermaLink="false">http://cyncity.forteds.com/?p=101</guid>
		<description><![CDATA[High-level synthesis (HLS) is just that&#8211;high level&#8211;a design approach that lets you work at a level above having to wade through pins and wires and state machines.  There are many factors to consider in choosing an HLS tool, but one of them is so fundamental that it often gets overlooked.
It&#8217;s interfaces. I&#8217;m not just [...]]]></description>
			<content:encoded><![CDATA[<p>High-level synthesis (HLS) is just that&#8211;high level&#8211;a design approach that lets you work at a level above having to wade through pins and wires and state machines.  There are many factors to consider in choosing an HLS tool, but one of them is so fundamental that it often gets overlooked.</p>
<p>It&#8217;s interfaces. I&#8217;m not just talking about declaring ports and hooking them up. I&#8217;m talking about high-level, modular interfaces that encapsulate very complex I/O protocols into easy-to-use function calls. Most of the tools out there offer some kind of interface solution, but we think they miss the mark in giving you what you need to be successful.  Some let you describe I/O behavior with standard ANSI C only to add the actual pin-level interface details later in synthesis (which, unfortunately, will be the first time you can actually verify the interface).  Some offer only off-the-shelf interface IP but then provide no means of creating custom interfaces.  We developed Cynthesizer with all of this in mind.  Cynthesizer designs are written in SystemC, where the clock and pin level activity of an interface can be simulated before the tool is run.  And Cynthesizer has a family of standard interface IP in addition to an editor where you can create any interface you want.</p>
<p>This is the first in a four-part series on designing and working with modular interfaces in Cynthesizer. Today I&#8217;ll be detailing the benefits of modular interfaces. In the coming weeks we&#8217;ll be talking about how useful they are with external memories, we&#8217;ll detail some of the techniques we use to build them, and we&#8217;ll show what exactly is available interface-wise from Forte.</p>
<p><span style="color: #0000ff;"><strong>What Is A Modular Interface?</strong></span><br />
When we talk to customers or prospects we use the word &#8220;transaction&#8221; a lot. And this is probably the best word to use in describing a modular interface. A transaction, in essence, is what is being communicated across a modular interface. A modular interface can be, for example, a burst write to a standard AHB bus model. Or it can be an exchange of data that follows a strict protocol in a fixed number of clock cycles. Or it can simply be the writing of a vector or datatype value with ready/valid handshaking. The modular interface combines the I/O protocol of your transaction (i.e. ports) with the actual functionality of the transaction. Whatever the case, to you as a user it is a single function call alongside your other high-level code.</p>
<p><strong><span style="color: #0000ff;">Why Use Them?</span></strong><br />
There are many benefits to designing this way. Your code is much simpler and you can describe an algorithm in fewer total lines. Your design effort can concentrate solely on the core of the algorithm at transaction level, all while retaining the flexibility to write custom interface protocols. Your connections become easier because all the pins involved in the interfaces are encapsulated in high-level channels. But the real benefit is in verification.  Since handshaking is built into a modular interface, all you need is one testbench to verify all of the RTL architectures your HLS tool produces.  Also, as mentioned above, Cynthesizer interfaces correctly simulate the interaction of their clocks and pins at the behavioral level, before you run any synthesis.  Remember that SystemC supports both TLM and pin-level I/O configurations.  We can&#8217;t state enough how critical this is to your HLS success&#8211;once your interface is verified behaviorally, it stays verified all the way down to gates or anywhere you reuse it.</p>
<p>Without modular interfaces you would spend a lot of time down in the trenches of I/O protocol design. You would have to declare whatever ports are needed for your transaction, make sure the direction of the ports was correct, declare signals in the parent module to connect them with, and then make sure all the connections are correct. But then comes the hard part, writing some kind of handshaking or acknowledgement scheme so that you only read input data when it is valid and only write output data when the downstream module is ready for it. This means an manual assertion of a ready signal, followed by a loop that sits and waits on a valid signal, followed by reading the data and storing it properly. And remember, you will repeat this for every interface you have.</p>
<p>Just look at the difference of this code with modular interfaces:</p>
<pre style="padding-left: 30px;">in_data = inp.get();
out_data = my_function( in_data );
outp.put( out_data );</pre>
<p>as compared to the same code written <em>without </em>modular interfaces:</p>
<pre style="padding-left: 30px;">inp_rdy = 1;
do {
    wait();
} while( !inp_vld );
in_data = inp.read();
inp_rdy = 0;
out_data = my_function( in_data );
do {
    wait();
} while( !outp_rdy );
outp.write( out_data );
outp_vld = 1;
wait();
outp_vld = 0;</pre>
<p>And this is only for a basic ready/valid handshake.  As the interface becomes more complex, the difference in the amount of code becomes more dramatic.</p>
<p><strong><span style="color: #0000ff;">How Does SystemC Help?</span></strong><br />
We&#8217;ve touched on it already, but SystemC really lends itself to modular interface design.  A SystemC modular interface has two sides: a tidy, transaction-level side like the <em>.get()</em> and <em>.put()</em> calls you see above; and a rougher, pin-level side where the gritty details of the cycle-accurate protocol are defined.  Some people criticize SystemC because the OSCI synthesizable subset requires modules to have pin-level ports like <em>sc_in&lt;&gt;</em> and <em>sc_out&lt;&gt;</em>, but in this case it&#8217;s an advantage.  While you as a designer work at a high level, it is the pin-level ports that are presented to your HLS tool.  With the pins broken out, the tool can more optimally combine your interface protocol and datapath with the control FSM.  Using modular interfaces does not mean sacrificing quality or reusability.</p>
<p>Next time, we&#8217;ll get into designing with external memories&#8211;a specific case where modular interfaces save loads of time and effort.</p>
]]></content:encoded>
			<wfw:commentRss>http://cyncity.forteds.com/2011/01/06/modular-interfaces-part-i-benefits/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hierarchy in SystemC: Why it&#8217;s so important for HLS!</title>
		<link>http://cyncity.forteds.com/2010/03/02/hierarchy-in-systemc-why-its-so-important-for-hls/</link>
		<comments>http://cyncity.forteds.com/2010/03/02/hierarchy-in-systemc-why-its-so-important-for-hls/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 04:41:44 +0000</pubDate>
		<dc:creator>Mike Meredith</dc:creator>
				<category><![CDATA[General/Misc.]]></category>
		<category><![CDATA[Methodology]]></category>

		<guid isPermaLink="false">http://cyncity.forteds.com/2010/02/28/hierarchy-in-systemc-why-its-so-important-for-hls/</guid>
		<description><![CDATA[Last time, I looked at the verification advantages of using SystemC for HLS.  This time, I want to explore another important capability of SystemC that makes it far superior to ANSI C for hardware design.
I&#8217;m talking about structural hierarchy. SystemC supports hierarchy while ANSI C does not.  Structural hierarchy means submodules, connected together [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://cyncity.forteds.com/2010/02/22/need-another-reason-to-use-systemc-for-hls-the-verification-advantage-is-the-best-of-all/">Last time</a>, I looked at the verification advantages of using SystemC for HLS.  This time, I want to explore another important capability of SystemC that makes it far superior to ANSI C for hardware design.</p>
<p>I&#8217;m talking about structural hierarchy. SystemC supports hierarchy while ANSI C does not.  Structural hierarchy means submodules, connected together and executing concurrently. Use of hierarchy is the traditional mainstay of hardware designers for breaking down complex designs into a group of smaller, more manageable designs that are easier to design and verify.</p>
<p>Here are some of the advantages that come from SystemC&#8217;s support for hierarchy:</p>
<p><strong><span style="color: #0000ff;">Unit-Level Verification</span></strong><br />
It&#8217;s easier to build a testbench that can stimulate all the critical corner cases in a design when it&#8217;s a smaller block.  Also, signoff requirements like code coverage are easier to meet when you have controllability at the smaller block boundary.  Making sure that every line of source code is covered requires a designer to find the correct sequence of input values to exercise every code branch.  With a small block you don&#8217;t have to try to trick the upstream blocks into producing the right stimulus, you can just have the testbench present whatever stimulus you need.  In large blocks this may be impossible altogether.</p>
<p><strong><span style="color: #0000ff;">Connections and Interfaces</span></strong><br />
Working at the submodule level allows designers to isolate the complex interfaces and channel connections between them.  ANSI C HLS tools, as mentioned in the previous post, do not accurately represent concurrent hardware. This can really cause problems designing and verifying interfaces because the handshaking and transactions all occur simultaneously. The simulation semantics of SystemC let you examine interfaces and channels at the pin-level and make it straightforward to code your own interfaces with whatever protocol you need. SystemC-based HLS also allows you to encapsulate the details of a particular protocol in a set of classes and easily switch from one interface to another without changing your module&#8217;s source code&#8211;but that&#8217;s the topic of a future posting!</p>
<p><strong><span style="color: #0000ff;">Architecture Design</span></strong><br />
In HLS design there is a step between algorithmic design and RTL generation: architectural design. This step takes the untimed algorithmic code and decides how major portions will be implemented in hardware to best meet QoR requirements, i.e. whether a particular array be an external memory or a flattened register. Some of this is done through synthesis directives or constraints, but a handy tactic is being able to partition a section of code into a nice hierarchical submodule.  This, and its verification advantage, is talked about more in <a href="http://www.edadesignline.com/howto/222900653;jsessionid=5YNCEVM025M2FQE1GHPCKH4ATMY32JVN">John Sanguinetti&#8217;s recent EETimes EDA DesignLine article</a>.</p>
<p><strong><span style="color: #0000ff;">Faster Runtimes</span></strong><br />
Everything will run faster with smaller modules.  The blocks will get through behavioral synthesis scheduling more quickly and the generated RTL will run through logic synthesis tools faster.  The block-level testing will go faster because smaller blocks compile and run in simulators faster as well.</p>
<p><span style="color: #0000ff;"><strong>Teams of Multiple Designers</strong></span><br />
Teams usually work in parallel, with multiple teams designing and verifying multiple blocks in parallel. Hierarchy makes design maintenance easier, makes it possible to keep a consistent set of code for HLS and verification, and keeps designers from stepping on each others&#8217; toes.</p>
<p><span style="color: #0000ff;"><strong>Reuse</strong></span><br />
When the next-generation design is derived from your current design,having the design broken into manageable blocks makes it much easier to reuse some of those blocks without an entirely new verification effort. It also improves your ability to figure out which blocks will have to be changed, or how to fit together a combination of old, new and modified blocks to meet the new requirements.</p>
<p>SystemC supports our familiar friend&#8211;structural hierarchy&#8211;and allows you to use many of the same techniques you are accustomed to for managing the complexity of design and verification tasks. Gee, using SystemC for HLS is just like having a real hardware language, only with higher levels of abstraction available.  No wait, it&#8217;s not like that&#8211;it&#8217;s <em>exactly</em> that!  And that&#8217;s what you really need for practical high-level hardware design.</p>
]]></content:encoded>
			<wfw:commentRss>http://cyncity.forteds.com/2010/03/02/hierarchy-in-systemc-why-its-so-important-for-hls/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Need another reason to use SystemC for HLS?  The verification advantage is the best of all.</title>
		<link>http://cyncity.forteds.com/2010/02/22/need-another-reason-to-use-systemc-for-hls-the-verification-advantage-is-the-best-of-all/</link>
		<comments>http://cyncity.forteds.com/2010/02/22/need-another-reason-to-use-systemc-for-hls-the-verification-advantage-is-the-best-of-all/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 21:42:15 +0000</pubDate>
		<dc:creator>Mike Meredith</dc:creator>
				<category><![CDATA[General/Misc.]]></category>
		<category><![CDATA[Methodology]]></category>

		<guid isPermaLink="false">http://cyncity.forteds.com/2010/02/22/need-another-reason-to-use-systemc-for-hls-the-verification-advantage-is-the-best-of-all/</guid>
		<description><![CDATA[The &#8220;language war&#8221; in high-level system (HLS) design has been waging for a while now.  You&#8217;ve probably read a lot of online publications touting the advantages of using SystemC over ANSI C to design at an abstract level. If you were to take what everyone is saying and boil it down to a few [...]]]></description>
			<content:encoded><![CDATA[<p>The &#8220;language war&#8221; in high-level system (HLS) design has been waging for a while now.  You&#8217;ve probably read a lot of online publications touting the advantages of using SystemC over ANSI C to design at an abstract level. If you were to take what everyone is saying and boil it down to a few key points, they might sound something like this:</p>
<ul>
<li>ANSI C is a sequential language.</li>
<li>ANSI C cannot execute two subroutines or functions concurrently.</li>
<li>ANSI C executes your code in a single flow, one line after another.</li>
<li>ANSI C-based HLS tools either limit themselves to single-block designs or provide a proprietary mechanism to mimic concurrency.</li>
<li>ANSI C HLS tools don&#8217;t give you a way to accurately simulate what a real piece of hardware does.</li>
</ul>
<ul>
<li>SystemC, on the other hand, is a standardized superset of C++ that supports multiple concurrent processes.</li>
<li>SystemC supports hierarchy and modules.</li>
<li>SystemC allows communication between those modules at the transaction or pin level.</li>
<li>SystemC gives hardware designers the ability to tackle the very complex design and interface tasks they face every day.</li>
</ul>
<p>Notice first I said &#8220;superset of C++&#8221; instead of &#8220;language.&#8221;  That&#8217;s because SystemC is indeed not a language: it&#8217;s a family of C++ classes specifically geared toward hardware design constructs like modules, ports, concurrency, clocks, resets and channels.</p>
<p>The list above is short, but to see what far-reaching consequences these points have for hardware designers, consider the following:</p>
<p>Let&#8217;s say you use an ANSI C tool that can only work with single blocks.  If you have a multiple block system, you can produce each block one at a time.  But there&#8217;s a catch: how do you verify that system?  According to the tool, that&#8217;s your problem.  It&#8217;s up to you to somehow stitch these blocks together in RTL and do all the verification in RTL.</p>
<p>If you use an ANSI C tool that mimics concurrent simulation using proprietary libraries or other non-standard techniques, you are locked into that proprietary flow.  Want to drive yourself crazy?  Just try using this flow to create some IP cells and distribute them to your customers.  Those customers will have to use (and own) the same tool just to run a simulation.</p>
<p>The proprietary library approach also takes liberties to get your ANSI C code to act like real hardware.  To emulate the dataflow of your design they will typically have you partition the design so each block is a subroutine, and require you to call proprietary APIs inside the subroutines to manage that dataflow. One common approach is to have you conditionally execute the algorithms inside the subroutines depending on the availability of data reported by these API calls.</p>
<p>Sound messy?  It is.  And when you consider that it takes an API call (also proprietary) to determine if there is data in the channel, then you start to understand how these libraries are rudimentary static simulators at best.</p>
<p>If you have a fairly simple design that has a sequential algorithm, can be controlled by a single finite state machine, processes the same number of inputs and outputs every cycle, and uses a limited set of interfaces to communicate, then ANSI C may work fine for you.  But the truth is that real projects aren&#8217;t that simple.  Complex designs have multiple processes operating concurrently, they communicate with things like external memories and they send data through interfaces that must be customizable.</p>
<p>If you use the SystemC classes for HLS design, you have a real built-in, event-driven simulator to support your verification efforts. SystemC supports multiple modules that can execute concurrently, can share data in memories, and can synchronize their execution using real signal-level protocols.  In other words, it allows you to attack the biggest design problems by breaking them down into multiple blocks and connecting them with channels that follow a protocol.</p>
<p>And with SystemC, you can easily simulate these blocks together to make sure everything is working correctly.</p>
<p>Can you do all this with ANSI C? You could, but how much time are you willing to spend writing the specialized hardware classes that already exist with SystemC?</p>
<p>There&#8217;s a lot more I could say about this, but I&#8217;ll let someone else do it for me.  John Sanguinetti, Forte&#8217;s CTO, just published <a title="High-level synthesis, verification and language" href="http://www.edadesignline.com/howto/222900653;jsessionid=5YNCEVM025M2FQE1GHPCKH4ATMY32JVN" target="_blank">this article</a> in EETimes&#8217; EDA DesignLine that takes another look at the verification angle.</p>
<p>Next time, I&#8217;ll take a little deeper look into another important advantage of SystemC in HLS design: its support of <a href="http://cyncity.forteds.com/2010/02/28/hierarchy-in-systemc-why-its-so-important-for-hls/">structural hierarchy</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://cyncity.forteds.com/2010/02/22/need-another-reason-to-use-systemc-for-hls-the-verification-advantage-is-the-best-of-all/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

